# Python 101
---

## Python

* For a data scientist, a programming language is just a tool to organize data, test hypothesis and build models
* We will use Python as is a user-friendly high-level, general-purpose, dynamic and interpreted scripting language 
* It has widespread applications in
    * developing software,
    * building web applications,
    * scientific computing
    * data science

This chapter is therefore intended to serve as a guide for you to learn the most pertinent and helpful features of Python that will enable you to process and analyze structured and unstructured data effectively. Remember, it is not necessary to become a proficient Python developer to do be productive as a data scientist. However, as is the case with learning any new language - you will need to acquire a decent vocabulary and be able to read and understand code.

---
## The Data Science Process

What do data scientists do? OSEMN things (pronounced 'awesome')
* `O` -> Obtain, gathering data from websites, APIs, databases, files etc. 
* `S` -> Scrub, cleaning and organising data into an analysis friendly shape and granularity 
* `E` -> Explore, visualise data and discover relationships between dependent and independent features  
* `M` -> Model, train and test models that fit your data
* `N` -> iNdustrialise, embed the ML or AI model into a system that feeds data into the model and receives output from it

We will now take a look at Python's *standard library* and discover important building blocks that help us in accomplishing these OSEMN things

---
## Set up the Environment

1. The quickest and hassle-free way to get up and running with the Python Data Science stack is by using the bundled distribution called Anaconda<br>[Download and Installation instructions: Anaconda](https://docs.anaconda.com/anaconda/install/)
2. If you don’t want the hundreds of packages included with Anaconda, install `Miniconda`, a lighter version of Anaconda that includes only the essentials<br>[Download and Installation instructions: Miniconda](https://docs.anaconda.com/miniconda/miniconda-install/) 
3. Though not essential right now, it is useful to know about virtual environments. [Read more about environments here](https://realpython.com/python-virtual-environments-a-primer/#how-can-you-work-with-a-python-virtual-environment)
4. Alternative: Use Python (comes pre-installed with most operating systems) and `pip`. <br>You can install the libraries required by opening the Terminal or Command Prompt and typing: 
`pip install pandas jupyterlab seaborn scikit-learn`

---
## Install Jupyter

Python is an **interpreted** language, which means that the code is executed *line-by-line*. This allows for **interactive** programming, which is critical for exploratory data science. Depending on the task at hand, Python allows you the flexibility to use it in one or more of several ways.

1. **The Python Interpreter** is a REPL (Read Eval Print Loop) that presents the user with a `>>>` prompt, making it very easy to run and test small snippets of code.
2. **The IPython Shell** is richer than the basic interpreter as it provides development enhancements like command history, auto-complete suggestions and code/object introspection.
3. **Python Scripts** allow you to bundle complex logic in files saved with the `.py` extension, which can be run all at once from the CLI using `$ python my_script.py`
4. **Jupyter** is web-based interface to use IPython that allows authors to create engaging documents that combine *live code, narrative text*, LaTeX equations, HTML objects (images, sound and videos) and even interactive widgets. It has the following advantages

    - Support for 40+ languages (including Julia, Python, R, Scala and Spark) through kernels
    - In-line visualizations and interactive widgets
    - Support for distributed computing
    - Code annotations with Markdown

An instance of the Jupyter Notebook can be started by typing `$ jupyter lab`

This will start 
- A backend kernel process that will handle the execution of your code
- A frontend web application (with your default browser) that allows you to edit code
- Click on the URL that looks like `http://localhost:8888/?token=abcd1234`
- From the *Launcher* tab, select the *Python 3* notebook
- Please visit [Project Jupyter](http://jupyter.org) to familiarize yourself with the latest and greatest features of Jupyter.

---
## Using Notebooks

- Notebooks are a collection of **cells** which can be used to write code, markdown or raw text.
- Press `enter` to go inside a cell and type some code or text
- Pressing `esc` puts the notebook into *command mode* and allows you to move across cells
- When in command mode, the following shortcuts are useful
    - `a` or `b` add a new cell above or below the current one
    - `dd` deletes a cell
    - `m` changes the type to markdown
    - `y` changes the type to code
    - `r` changes the type to raw text
- Use `CTRL + b` to show/hide the side panel
- Press `SHIFT + ENTER` to run a cell
- Autocomplete object names with `tab`
- A full list of keyboard shortcuts is available under the `Help` section

---
## Your first Python program

- Go to a cell and write the code below
- When done, press `shift + enter`

```python
print("Hello Python!")
```

---
## Another example

```python
# Declare a list of numbers
some_data = [2, 15, -6, 28, 39, 0, 52]

# Define a function 
def adder(x):
    """
    This function takes as input a list of numbers, and prints whether their sum is odd or even. It returns the sum.
    """
    sum = 0
    # Loop over the list of numbers for i in x:
    sum += i
    if sum % 2 == 0:
    print "The sum is even."
    else:
    print "The sum is odd."
    return sum

# Call the function
some_data_sum = adder(some_data)

# Print the output
print(some_data_sum)
```

This should produce the following output

```
The sum is even.
130
```

The script doesn't do much, but it's only meant to higlight some important aspects of Python syntax. 
Now let's go over the code line-by-line and understand what's happening.

- Comments begin with a `#` sign
- `some_data` is a variable name.
- The comma-separated numbers inside square brackets `[]` make up a Python collection `list`, which are *iterable*
- The `=` creates a binding between the variable name and the list of numbers. This is also called an *assignment*
- *Functions* are defined with the `def` keyword. The name of this function is `adder`. Functions can optionally *return* objects
- *Triple quotes* are used for multi-line comments. It is customary to include a *docstring* with Python functions that describes what it does
- Blocks of code (*for-loop, if-else* block) are denoted using *indentation* and not curly braces (unlike R, Java, C++)
    - This is critical as it enforces a style of syntax that makes most Python code looking cosmetically similar and hence improves readability.
- A colon `:` denotes the start of an indented code block after which all of the code must be indented by the same amount until the end of the block
- The `for` loop iterates over the list of numbers
- The `if ... else` statement is used to check logic. (The `%` operator finds the remainder of a division.)
- Python statements aren't terminated using semi-colons `;`

---
## The Zen of Python & Pythonic Code

In a new cell, type `import this` and run it.
You'll find a list of tenets laid down by Tim Peters that describe the philosophy of the creators of Python.

One of these, is often the subject of lengthy discussions: `There should be one — and preferably only one — obvious way to do it.`

Code written in accordance with this “obvious” way is often described as being **Pythonic**.

On community-driven forums like *StackOverflow*, you will often find questions to the tune of - *Is there a Pythonic way of doing X?* asked by programmers who feel that the code they've written is ugly and/or too complex. Though this may seem myterious right now, as you read and write more Python code, it will become evident to you and you will begin to leverage features of the language in the fabled obvious way.

As in any other language, there are often multiple ways of doing the same thing in Python, and whenever faced with such a choice, we will favor the Pythonic way over others.

---
## Getting help on-the-go

Never fret or despair when you're learning Python, for help is always at hand! 

The Official Python documentation is built into the Jupyter ecosystem. Whenever you find yourself stuck or unclear on what a function does or whether you're typing the right syntax, just call for help by appending a `?` Question mark at the end of the object, and run the cell.

For example, let's assume you don't know what the `id()` function does. You may simply write `id?` and the function's docstring will be displayed for you to read through and take note of things such as
- What arguments does the function take?
- What is the data type of the returned object?
- If you need even more information on a function or object, try using a double-question mark `??`.


Some other useful functions in the standard library include 
* `dir(object)` displays a list of `methods` associated with the object. Methods are functions embedded within objects.
* `type(object)` shows you which type an object belongs to


---

## Advice for budding programmers

To get started with learning any new **OOP** language, the very first steps you have to take include learning about:

- variables, values and objects
- data types
- data structures
- arithmetic, control and logical operators

Remember that learning a new programming language is much like learning a new foreign language. You cannot claim to know the new language until you have:

- built a decent vocabulary (conversational, at least)
- can translate/express ideas effectively

The following sections will take you through the **core** of the language - build your  vocabulary (commit certain things to memory) as you go along and try to express concepts  that you know (for example, generating numbers from the Fibonacci sequence) using what you learn.

---
## A Note on Object Oriented Programming

> Everything in Python is an Object

- Objects are derived from Classes, which are definitions of data and associated methods
    - *Car* is an object of type *Vehicle*. So is *scooter*
    - *Human* is an object of type *Mammal*. So is *dolphin*. 
- Objects have **Attributes**, accessed by placing a dot after the object's name and with no parentheses after the attribute's name
    - `car.num_tyres, scooter.num_tyres`
    - `human.has_tail, dolphin.has_tail`
- Objects also have **Methods** accessed by placing a dot after the object's name and with a parentheses after the method's name
Methods work on the data stored inside the object. Methods can optionally take input data as `arguments` and optionally return a result
    - `car.add_sunroof(1), scooter.drop_spare_tyre(1)`
    - `human.add_vaccine(5), dolphin.max_depth(50)`

Much of programming in Python revolves around creating Objects from Classes. 
We will then explore the methods and attributes of objects of every kind of class.

For example, we can create a data table using the class DataFrame from the pandas library.
More on libraries to come later.

Much like other OOP languages, Python's object model is remarkably consistent. 
Every number, string, data structure, function, class, module etc. exists inside the interpreter in its own "box” which is referred to as a Python object. 

Each object in Python has the following things associated with it.
- a data type (for example, 'string' or 'function')
- internal data
- some metadata (for example, `shape, ) called attributes
- some functions (for example, count,join,append) called methods that have access to the internal data. In other words, a methodis a function that “belongs to” an object
- These can be accessed using the dot <obj-name>.<attribute-or-method>syntax.
- Methods are followed by parentheses `()` as they mostly take additional parameters.

# The Python Standard Library

> Python has a small set of built-in types for handling numerical data, strings, boolean (True or False) values, date/time, missing values. 

These are the quintessential building blocks for storing and manipulating data in Python.

**Python primitives**

```python
None - The Python Null Value
str, unicode- for strings
int - signed integer whose maximum value is platform dependent.
long - large ints are automatically converted to long
float - 64-bit (double precision) floating point numbers
bool - a True or False value
```
Check the data type of an object using `type()` or verify it with `isinstance()` 
The function isinstance() takes as input two things - an object and a type. It returns a True if the object belongs to the type specified.

```python
In : isinstance(2, int)
Out: True

In : type(3.41)
Out: float

In : type(None)
Out: NoneType

In : type(True)
Out: bool
```

---
##  Numeric Type `int`

- The most basic numerical type is the int. Any number without a decimal point is an integer.
- Pressing `tab` following a dot after an `int` object would show you all the methods associated it
    - Use the `?` to know what a method does

In [2]:
x = 5
type(x)

int

In [None]:
# press tab after the dot
x.

Learn more about each of these by running commands like

In [3]:
x.conjugate?

[0;31mDocstring:[0m Returns self, the complex conjugate of any int.
[0;31mType:[0m      builtin_function_or_method

---
##  Numeric Type `float`

- Any number with a decimal point is stored internally as an instance of type float.
- These can be defined in standard or scientific notation

In [4]:
float_1 = 3.142
float_2 = 3e8

In [5]:
type(float_1), type(float_2)

(float, float)

In [None]:
# press tab after the dot
float_2.

In [15]:
# learn more about float methods
float_1.is_integer?

[0;31mSignature:[0m [0mfloat_1[0m[0;34m.[0m[0mis_integer[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return True if the float is an integer.
[0;31mType:[0m      builtin_function_or_method

## String Type `str`

- Many programmers favor Python owing to its flexible and powerful built-in string processing capabilities.

- Objects of type `str` are created with the single `'`, double `"`, triple `"""` quotes or using the `str()` function

```python
str_1 = 'This is a string' 
str_2 = "This too is a string"
str_3 = """This is a string as well"""
str_4 = str(1.716)
```
  
- Strings are *iterables*, meaning you can loop over the characters

```python
for c in 'hello':
    print(c)
```

- Some useful string Methods

```python
str_1 = "a quick brown fox"

# convert to uppercase
In : str_1.upper() 
Out: 'A QUICK BROWN FOX'

# replace characters
In : str_1.replace('fox', 'dog') 
Out: 'a quick brown dog'

# split string on a delimiter and returns a list.
In : str_1.split(' ')
Out: ['a', 'quick', 'brown', 'fox']

# returns index of first instance of a character/word 
In : str_1.find('quick') 
Out: 2

# returns count of a character
In : str_1.count('o') 
Out: 2

# returns boolean if patterns match
In : str_1.endswith('ox') 
Out: True
```

- Explore more methods using the dot and tab method

In [None]:
str_1 = "a quick brown fox"

In [None]:
# hit tab after the dot
str_1.

In [None]:
# use the ? to explore
str_1.capitalize?

- **String Arithmetic**: The `+` operator concatenates strings, and the `*` operator creates copies

```python
In : str_1 + ' jumps over the' + ' lazy dog' 
Out: 'a quick brown fox jumps over the lazy dog'

In : 'mango ' * 5
Out: 'mango mango mango mango mango '
```

- Try it out yourself in the cell below:

- String can be **subscripted** allowing you to extract parts of it using square brackets `[]` and character locations/ranges
    - **Remember that indexing in Python starts at 0 and ends at N-1**
- It follows the syntax `my_str[start:stop:step]` 

```python
In : str_1[:4]
Out: 'This'
```

In [None]:
s = 'a hot air balloon'

In [None]:
# Extract a single character by its position
s[2]

In [None]:
# Extract many characters using SPLICES
s[5:]

In [None]:
s[:10]

In [None]:
s[2:5]

In [None]:
# Skipping over characters
s[::2]

In [None]:
s[:15:3]

In [None]:
# reverse the string
s[::-1]

In [None]:
s.endswith('loon')

In [None]:
s.startswith('a hot')

In [None]:
# Splitting a string gives a List
s.split(" ")

In [None]:
list_1 = s.split(' ')
type(list_1)

- Find substrings within strings with the `in` keyword

In [None]:
'air' in s

In [None]:
'fare' in s

- Also: the `find()` and `index()` methods

In [None]:
s.find('air')

In [None]:
s.find('hair')

In [None]:
s.index('hot')

In [None]:
s.index('pot')

### TASK: Learn about and experiment with the following str methods. Think of an application for them.

---
##  Type Conversion functions

- Convert to float using `float()`
- Convert to int using `int()`
- Convert to str using `str()`

In [16]:
int(3.142)

3

In [17]:
float(42)

42.0

In [21]:
int('42')

42

In [20]:
float('3.142')

3.142

In [23]:
str(3)

'3'

In [24]:
str(3.142)

'3.142'

### Task: Fix the error in this

---
## Type `bool`

- Useful in conditional programming

In [65]:
1 == 1

True

In [66]:
1 != 1

False

In [67]:
4 < 5

True

In [68]:
5 < 2

False

In [69]:
(4 < 5) and (5 < 2)

False

In [70]:
x = True

In [71]:
type(x)

bool

In [72]:
isinstance(x, bool)

True

In [73]:
# explore methods
x.

SyntaxError: invalid syntax (1209101337.py, line 1)

In [74]:
bool([])

False

## The `None` type

- useful for denoting missing data

In [80]:
x = None

In [83]:
type(x)

NoneType

In [76]:
x.
# no methods or attributes

In [77]:
x == None

True

In [84]:
isinstance(x, type(None))

True

---
## Importing Modules (or Libraries)

Modules are collections of Classes, Functions and other objects created by someone that you can *import* into your workspace and use.
So far, we've worked only with the Standard Library and the objects within it.

Python's functionality is extended via Modules that contain specialzed sets of objects for a variety of domains like astronomy, machine learning, biological research, web development and so on.

There are 3 ways of doing this

1. Import the entire module (**not** recommended)

~~import pandas~~ <br>
~~pandas.DataFrame~~

Also avoid writing `from pandas import *`

2. Provide an alias (recommended)

```python
import pandas as pd
pd.DataFrame
```

3. Import whatever you need (recommended)

```python
from pandas import DataFrame
DataFrame
```

# Flow Control

---

$Traditional$

---

```python
if condition:
    action
else:
    alternative
```

---

$Ternary$

```python
action if condition else alternative        
```

In [86]:
name = 'Mr A'


if 'Mrs' in name:
    print("this person is female")
else:
    print("this person is male")

this person is male


In [87]:
('female' if 'Mrs' in name else 'male')

'male'

Example 2

In [88]:
name = 'Alex'

In [89]:
if (name == 'Alex'):
    print('Hi Alex')

Hi Alex


In [90]:
if name != 'Alice':
    print('You are not Alice')

You are not Alice


In [91]:
if (5>1):
    print('Success')

Success


In [92]:
name = 'John'

In [93]:
if name == 'John':
    print('How are you?')
else:
    print('Nice to meet you.')

How are you?


In [94]:
name = 'Johny'
age = 15
lastname = 'Noel'

In [95]:
if (name == 'John'):
    print('How are you?')
elif age < 18:
    print("You're just a teenager")
elif (lastname == 'Doe'):
    print("Never heard that name before")
else:
    print('you do not qualify')

You're just a teenager


---
### Expressing Complex Logic

In [96]:
(5 > 2) & (10 < 3)

False

In [97]:
(5 > 2) | (10 < 3)

True

In [98]:
(5 > 2) and (10 < 3)

False

In [99]:
(5 > 2) or (10 < 3)

True

## TERNARY IF-THEN-ELSE

In [100]:
x = 5; y = 3

1 if x > y else 0

1

In [101]:
name = 'John'

In [103]:
'How are you' if name == 'John' else 'Hello Stranger!'

'How are you'

In [105]:
if name == 'John':
    print('How are you')
else:
    print('Pleased to meet you.')

How are you


### TASK

The marks for a student in six subjects are as: 78, 93, 81, 84, 57 and 90


Use the if-else construct to print his grades in each subject:

- A: 90+
- B: 80-90
- C: 60-80
- D: 40-60
- E: under 40

---

# The `for` Loop

- Generating numbers for iterating over

In [108]:
list(range(0, 10, 2))

[0, 2, 4, 6, 8]

In [109]:
list(range(0, 1, .2))

TypeError: 'float' object cannot be interpreted as an integer

In [110]:
import numpy as np
np.arange(0, 1, .2)

array([0. , 0.2, 0.4, 0.6, 0.8])

In [111]:
np.linspace(0, 1, 50)

array([0.        , 0.02040816, 0.04081633, 0.06122449, 0.08163265,
       0.10204082, 0.12244898, 0.14285714, 0.16326531, 0.18367347,
       0.20408163, 0.2244898 , 0.24489796, 0.26530612, 0.28571429,
       0.30612245, 0.32653061, 0.34693878, 0.36734694, 0.3877551 ,
       0.40816327, 0.42857143, 0.44897959, 0.46938776, 0.48979592,
       0.51020408, 0.53061224, 0.55102041, 0.57142857, 0.59183673,
       0.6122449 , 0.63265306, 0.65306122, 0.67346939, 0.69387755,
       0.71428571, 0.73469388, 0.75510204, 0.7755102 , 0.79591837,
       0.81632653, 0.83673469, 0.85714286, 0.87755102, 0.89795918,
       0.91836735, 0.93877551, 0.95918367, 0.97959184, 1.        ])

- Lists are iterables

In [115]:
r = range(50, 101, 10)

for i in r:
    print(i)

50
60
70
80
90
100


In [117]:
for i in r:
    i = i+10
    print(i)

60
70
80
90
100
110


In [120]:
my_str = 'The sky is blue.'
print(my_str, '\n')

for i in my_str.split(' '):
    print(i.capitalize())

The sky is blue. 

The
Sky
Is
Blue.


---

### Task 3: Print all PRIME numbers between 0 and 100

- Prime Numbers are those that are divisble only by 1 and themselves

### Task 4: Primes till a given number

---

### Task 4: Solve the grading problem above using a for loop.

Hint: Put the scores inside a list.

Here's how you declare a list

    my_list = [12, 17, 21, 24]

### Task 5: Print this pattern using `for` loops

    *
    **
    ***
    ****
    *****
    *****
    ****
    ***
    **
    *