# Getting started with Python

The demand for data science related skills are growing rapidly in recent years.  Irrespective of the domain, data science methods have helped leverage the power of data to provide invaluable insights to businesses. This has led to an explosion of data science tools and techniques which has created countless opportunities. However, due to this same reason, starting out in this field can be daunting. If you want to become a data science practitioner but do not have any experience, the question will be: 
Where do I start? Which skill should I learn first? 

The best answer is to start with Python. Here is why:<br>
Python is a simple, easy-to-learn, general purpose programming language. It is rich with all the features of any object oriented language. Scientists and Mathematicians have been using python since its inception and hence is popular for analytical tasks. Python has quickly become the go to language for data scientists all over the world.   

Python is an interpreted, interactive, object-oriented programming language.

In this micro course, we will cover the following topics:

<li>Features of Python language</li>
<li>Why is Python a Favorite of Data Scientists </li>
<li>Python Standard Library </li>
<br>   

## Features of the Python language


### Python is easy

For example consider the code below:

```python
print('Hello world')
```

The above code outputs what it reads. That is, it prints the text "Hello World".
Indentation is built-in feature of python language. This makes it easy to read the code. For example:

```python
def sayHello():
    print('Hello world')
# Then we can call the function to see the result:
sayHello()
```
In the above code snippet, the function definition line is followed by a statement, print in the case, which is indented. This indicates that the statement is part of the function definition.

Here is a simple exercise, you can practice in the following code cell.<br>
Try to define a function "hiPython()", which can print out "Hi everyone, this is Python". Then call the function to see the result.

#### Loop
In general, statements are executed sequentially: The first statement in a function is executed first, followed by the second, and so on.
There may be a situation when you need to execute a block of code several number of times. Programming languages provide various control structures that allow for more complicated execution paths.

A loop statement allows us to execute a statement or group of statements multiple times.

Here we introduce a simple **for loop**. It will execute a sequence of statements multiple times and abbreviates the code that manages the loop variable.


Syntax: <br>
range(1,5) -- generates a list of numbers from 1 to 4.

Example:
```python
    for i in range(1,5):
        print(i, "This is Python!")
# Output
>>> 1 This is Python!
>>> 2 This is Python!
>>> 3 This is Python!
>>> 4 This is Python!
```
#### Exercise
Use for loop and the function "hiPython()" you created above, call the function five times.

When we want to put a placeholder for a variable and pass the value to the function separately, we can use something called **format specifier**. This is typically a set of symbols which tell the function about the format in which the value being passed is to be printed. Format specifiers for various types of data types in Python are similar to other programming languages:

* %d for integer
* %f for float
* %s for string
* %r for raw object using 'repr' method
* %x for hex code

We can use the format specifiers as placeholders in a print statement, and pass the value of a specific variable using the '%' operator. Multiple variables/values can also be passed by enclosing all variables in paranthesis and separating them by commas.

An example:

```python
a = 10
b = 5

print("I want to print %d and %d in the same sentence"%(a,b))

# Output
>>> I want to print 10 and 5 in the same sentence
```

Another example:

```python
a = 10.264725
b = "a string type"

print("I want to print %f and %s in the same sentence"%(a,b))

# Output
>>> I want to print 10.264725 and a string type in the same sentence
```
Alternative to the '%' operator, we have the format() function which can also be used.

```python
a = 10
b = 5

print("I want to print {:d} and {:d} in the same sentence".format(a,b))

# Output
>>> I want to print 10 and 5 in the same sentence
```

```python
a = 10.264725
b = "a string type"

print("I want to print {:f} and {:s} in the same sentence".format(a,b))

# Output
>>> I want to print 10.264725 and a string type in the same sentence
```
#### Exercise
Print out the sentence "Hi everyone! I learned how to print 1 and 1.0." Use **format specifier** for the integer "1", the float "1.0", and the string "Hi everyone!".

If we want to specify the number of digits after the decimal point for the above example, we need to define in the format specifier. By changing ```{:f}``` into ```{:.1f}``` in the print() function, we will get the float number with single digit after decimal. 
#### Rounding off a float and adding preceeding zeroes

In the above example, where we had a float value, we want to restrict the number of decimals and print a rounded value. On the other hand we may want to add a preceeding zero to the number, in case the value here is a single observation in a larger pool of numbers which have 3 digits before the decimal.
We can add preceeding zeroes and round off float values by specifying the type of formatting for a float, after the '%' symbol and before the 'f' symbol of the format specifier.
* Use a period symbol ('.') to denote the decimal in the float value
* For preceeding zeroes, add total desired length of the number (including decimal point), in digits, before the period symbol. And before this number, add a '0' to denote that all preceeding blank spaces need to be filled with zeroes.
* For rounding off decimals, add the number up to which you want decimals, right after the period symbol and before the 'f' symbol

<img src="format_specifiers.PNG" style="width:50vw">

A few examples:
```python
a = 10.264725 

print('''Variations of a floating point:
1. {:.2f}
2. {:.3f}
3. {:4.4f}
4. {:04.4f}
5. {:8.4f}
6. {:08.4f}
7. {:010.4f}'''.format(a,a,a,a,a,a,a))

# Output
>>> Variations of a floating point:
>>> 1. 10.26
>>> 2. 10.265
>>> 3. 10.2647
>>> 4. 10.2647
>>> 5.  10.2647
>>> 6. 010.2647
>>> 7. 00010.2647
```

<b>Important note:</b> In above example '{:4.4f}' and '{04.4f}' give the same output, where we cannot see any padding (neither blanks nor zeroes). This is because the length that we specified for the string is 4, whereas we have also asked for 4 digits after the decimal point. So the digits after decimal take precedence and these 4 digits are added first.
Then the whole number is generated - the characteristic (digits before decimal) and the mantissa (digits after the decimal). After this the control checks to add padding. It now finds that the condition for length of number we have specified is 4, and the length of number is already more than 4, hence it does not add any padding blanks or zeroes.

For more details on formatting refer to: https://pyformat.info/

#### Exercise

Given variables - 
* a = 5
* b = 134.264262
* c = "Hello! How are you?"

Print the following using appropriate format specifiers
* the integer and a floating point equivalent of a
* the value of b, upto one decimal place and padded with two preceeding zeroes
* the string c, truncated up to first 10 characters

In [29]:
a = 5
b = 134.264262
c = "Hello! How are you?"
# write your code below:


### Well Supported and Widely Used

Python has been hugely popular and widely used for several types of applications. Python is available on all major Operating Systems and platforms. Many libraries are available in python including Statistics, Machine Learning libraries etc. Python has wrappers for various APIs and native components using C. Python is a versatile language. 

### General Application Programming

Python can be used to create both command-line, Web Apps and cross-platform GUI applications. There are GUI modules such as TkInter to develop GUI apps. You can deploy them as self-contained executables using packages such as cx_Freeze and PyInstaller.

### Web Application Development

Python can be used to for developing enterprise web applications using Web App frames such as Flask and Django. 


### Data Persistence and Data Handling in Python

Python has built-in support for Data Persistence with support for SQLite library. It has support for JSON handling (https://docs.python.org/2/library/json.html), XML parsing using DOM and SAX parsers.

For additional help with SQLite3 Data library in python, refer to the link: https://docs.python.org/2/library/sqlite3.html 

### Object-Oriented Implementation
Python supports object oriented(OO) programming. In later section of this course we will introduce Python objects. OO involves  creating a class and managing that class with a set of methods that will be defined later in the class.


### Dynamically Typed

Python is dynamically typed (or duck typed), meaning the data types need not be specified unlike the static typed languages such as C or Java.

For programmers from Java or C background, there is a separate version of python called mypy that allows static typing. This provides a way for using python with static typing.

Please refer to mypy http://mypy-lang.org/examples.html.

<br>

## Why is Python a favorite of Data Scientists?


Python has gained popularity within the data science community and especially amongst data scientists over the last decade. It allows for quick prototyping of machine learning models with functional support for data manipulation (through packages like Pandas), data visualizations (packages like matplotlib, plotly and bokeh) and model building (scikit learn, scipy etc). 



## The Python Standard Library

Python’s standard library is very extensive, offering a wide range of facilities. The library contains built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming. Some of these modules are explicitly designed to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs.

The standard library contains a variety of components such as data types, built-in functions and exceptions, and a collection of modules. Some modules are written in C and built in to the Python interpreter; others are written in Python and imported in source form.

Some of the categories and built-in libraries are included below:

### Built-in Functions
 
 Some of the typical built-in functions are :<br>
 print() - for printing to system output<br>
 str() -  to convert an object to a printable string<br>
 int() - to return an integer object from a number or string (with digits only)<br>
 file() - Constructor for file object representing a file.<br>
 open() - to open a file for reading, writing or appending.<br>
 
 etc.
 
 For an exhaustive list of built-in functions please refer to this link : https://docs.python.org/2/library/functions.html#built-in-funcs
 
 
### Built-in Types

The python language contains several built-in types. The principal built-in types are <b>numerics, sequences, mappings, files, classes, instances</b> and <b>exceptions.</b> This includes the data types and several other types.

There are four distinct numeric types: <b>plain integers, long integers, floating point numbers,</b> and <b>complex numbers.</b> In addition, Booleans are a subtype of plain integers.

There are seven sequence types: <b>strings, Unicode strings, lists, tuples, bytearrays, buffers,</b> and <b>xrange objects.</b> For example, String literals are sequences which are written in single or double quotes: 'xyzzy', "frobozz". Unicode strings are much like strings, but are specified in the syntax using a preceding 'u' character: u'abc', u"def".

There are also <b>dict</b>, which is a Map Type and <b>set</b> which is part of Set types within python.

Callable Types are types which can be invoked from python statements - such as <b>User-defined functions, User-defined (class) Methods, Class Types</b> and <b>Class Instances</b> 

<img src="built_in_types.png" style="width:30vw">


For more details refer to this link : https://docs.python.org/2/library/stdtypes.html

### String operations

Standard Library provides some string operations using string module such as lower(), upper() to convert string to lower case or upper case respectively. In addition it has several functions such as format() for string formatting.<br> 
The following example shows how string operations are different from numerical operations, as well as converting a lowercase string to uppercase:

```python
a = 1 + 2
b = str(1) + str(2)
print(a, b)
c = 'abc'
c.upper()
```
Try to concatenate two strings "Hello " and "World!", then convert all letters to lower case. You can try your code in the following cell.

Other packages in standard library pertains to:

<li> Data Type operations</li>

<li> Numerical & Mathematical operations </li>

<li> File Access operations </li>

For detailed explanations of each component of standard library, see this link: https://docs.python.org/2/library/index.html#library-index 

### How to install additional libraries?

As you progress in your career you would usually require more tools than that are provided in the standard library. 
In that case we would need to install these libraries. For that the simplest way is the 'pip install' command. <br>
Pip Installs Packages (pip) acts as a package manager and if the [anaconda installation](https://www.anaconda.com/distribution/) is successful then pip can be used to add aditional libraries of your choice. The command is a terminal commands and hence should be run on the terminal/command line prompt <br>
Command: 'pip install (name of package)'<br>

For instance: 
<li>pip install keras</li>
<li>pip install numpy</li>

Our data science course at https://refactored.ai has all the detailed steps for installing Anaconda distribution, feel free to check it out.


## Python version

### Python installation check
In order to confirm our installation of Python in our OS, we will use the terminal/command Prompt
We would open the terminal and type the command<br><br> ```python --version```<br><br>

<b>Note:</b> If you have used Anaconda distribution to install Python, launch the Anaconda prompt and type out the above command.

<b>For standalone Python installation:</b>
<img src="standalone_py_ver.png" style="width:50vw">


<b>For Python installation using Anaconda:</b>
<img src="anaconda_py_ver.PNG" style="width:50vw">

### Python 3 vs Python 2
There are many differences between Python 3 and Python 2, we would be looking at some mild differences in this section so that Data Scientists can get an idea how to practice analysis regardless of the version.

#### Print Statements
```Python
# only in version 2
print 'Data Science'

# version 2 and version 3
print('Data Science')
```

We would like to establish a connection between the two versions.<br>
For instance, if you have version 2 but you need to work with the version 3 syntax for work practices or other reasons
We can simplify it in this way:

```Python
# version 2
from __future__ import print_function
print('Data Science')
```

Another Popular example would be the division.<br>
In <b>version 2</b>, by default each division between two integers will be rounded to the nearest integer
```Python
print 5/2
>>> 2
```
But in <b>version 3</b>
```Python
print(5/2)
>>> 2.5

print(5//2)
>>> 2
```

For version 2 users if you need to have a version 3 programming practice<br>
simply use it in this manner
```Python
from __future__ import division
print(5/2)
>>> 2.5
print(5//2)
>>> 2
```

### Why Python 3?
Python 2 has been in use for a long time and many legacy systems had been using the language for their development.<br> 
Post the introduction of Python 3, version 2 is being deprecated. Python 3 has garnered a lot of popularity because it has been adopted by many programmers who are developing software in version 3 which may not be usable in the previous versions. 
Many companies are also enroute to converting their legacy code, which was in version 2, to version 3. 
For a smooth learning experience, it is suggested that we use version 3 instead of version 2 as the Refactored Platform is built in version 3.

## Learn more about Python

Now you know how to code in python. It provides you a powerful way to analyze data and build machine learning models. This notebook only provides you a flavor of the basics of python. Python is a diverse language and this notebook is meant to be a window in its capabilities. If you want to start learning python and using jupyter notebooks, check out our course at https://refactored.ai. Our course on python covers everything from introductory python to pandas, to data visualization with Plotly and Bokeh. 
