# Getting Started with Python

## Data Types

### Numeric Data

More generally, we can categorize numeric data as:

- Integer 
 - It's literally the `integer` that we know in mathematics.
- Float
 - You can think of this as numbers with a decimal point. 
 - It's the entire `real numbers` that we know in mathematics.
 
 Below, we will show that we can do all basic arithmetic computations in Python.

In [1]:
1 + 5

6

In [2]:
3 - 10

-7

In [3]:
1.23 * 0.6

0.738

In [4]:
43 / 2

21.5

In [5]:
5 ** 2

25

In [6]:
15 % 6

3

You can assign the result of a computation to variable. Which you will most likely use everytime.

In [7]:
sales = 10000
rate = 0.03
commission = sales*rate
print(commission)

300.0


<b> Note: </b> A Conceptual Integer can be a float too. Also, Python is great because it is human-readable. The `type` command literally shows the data type of any variable.

In [8]:
a = 1.0
print(type(a))

<class 'float'>


In [9]:
type(sales)

int

### Strings

In [10]:
'This is a string.'

'This is a string.'

In [11]:
"This is also a string"

'This is also a string'

In [12]:
"This is a string that has '' in it"

"This is a string that has '' in it"

In [13]:
'This is a string that has "" in it'

'This is a string that has "" in it'

In [14]:
my_name = 'Benjamin'

In [15]:
my_name

'Benjamin'

In [16]:
print(my_name)

Benjamin


Like anything else in life, there are also multiple ways in doing the same thing in Python. My advice would be to pickup a single way to do it. Once you get comfortable with Python syntax, it's easy to experiment with your code.

In [17]:
Age = 23
Name = 'Benjamin'
print('My name is: {}, and my age is: {}'.format(Name,Age))

My name is: Benjamin, and my age is: 23


In [18]:
# This is my suggested way. Since it is easier to read.
print('My name is:', Name, 'and my age is', Age)

My name is: Benjamin and my age is 23


### Booleans

- It's simply either `True` or `False`.
- We'll see more of this when we discuss:
 - `Logic Operators`
 - `if`,`elif`, `else` statements
 - `for` and `while` loops

### Lists

In [19]:
[1,2,3]

[1, 2, 3]

In [20]:
['Hello',2,3]

['Hello', 2, 3]

In the Data Science workflow, understanding how to work with a list inside a list is helpful in data clearning.

In [21]:
['Hello',2,3,[6,7]]

['Hello', 2, 3, [6, 7]]

In [22]:
my_list = [10,9,8,7]

In [23]:
my_list.append('a')

In [24]:
my_list

[10, 9, 8, 7, 'a']

In [25]:
my_list[0]

10

In [26]:
my_list[2]

8

In [27]:
my_list[2:]

[8, 7, 'a']

In [28]:
my_list[:3]

[10, 9, 8]

In [29]:
my_list[4] = 6

In [30]:
my_list

[10, 9, 8, 7, 6]

In [31]:
len(my_list)

5

In [32]:
sum(my_list)

40

In [33]:
nested_list = ['Hello',2,3,[6,7]]

In [34]:
nested_list[3][0]

6

### Dictionaries

- Key Value Pairs
- Useful when working with JSON data. Since a JSON is essentially a python dictionary (Sample Use : Facebook Graph API).

In [35]:
dictionary = {'Name':'Benjamin','Age':23}

In [36]:
dictionary

{'Name': 'Benjamin', 'Age': 23}

In [37]:
dictionary['Name']

'Benjamin'

In [38]:
dictionary['Gender'] = 'Male'

In [39]:
dictionary

{'Name': 'Benjamin', 'Age': 23, 'Gender': 'Male'}

### Tuples

- Immutable (Values can not be updated.)

In [40]:
t = (1,2,3,4,5,6,7)

In [41]:
t[3]

4

In [42]:
t[3:]

(4, 5, 6, 7)

In [43]:
t[5] = 6

TypeError: 'tuple' object does not support item assignment

### Sets

- Conceptually like the notion of a set in mathematics. Multiple instance of a single entry does not matter.

In [44]:
{1,2,3}

{1, 2, 3}

In [45]:
{1,2,3,1,2,1,2,3,3,3,3,2,2,2,1,1,2}

{1, 2, 3}

In [46]:
[1,2,3,1,2,1,2,3,3,3,3,2,2,2,1,1,2]

[1, 2, 3, 1, 2, 1, 2, 3, 3, 3, 3, 2, 2, 2, 1, 1, 2]

## Comparison and Logic Operators

In [47]:
10 > 2

True

In [48]:
2 - 5 < -10

False

In [49]:
5 >= 5

True

In [50]:
6 <= 100

True

In [51]:
x = 'Benjamin'

In [52]:
x == 'Benjamine'

False

A Data Science use case of understanding Logic Operators is when you are filtering data. <br>

(Example: Get the records of patients with $Height \geq 167cm$ `and` $Weight \leq 50kg$)

Spoiler Alert : Filtering a `pandas` dataframe has a slightly different syntax.

In [53]:
(10>5) and (3<6)

True

In [54]:
(10>5) and (3<2)

False

In [55]:
(10>5) or (3<6)

True

In [56]:
(10>5) and (3<2)

False

## if, elif and else Statements

This is a very important concept and is widely used in practice (Often in conjunction with  `for loop`).

In [57]:
if 1>2:
    print('Correct')

In [58]:
if 1>2:
    print('Correct')
else:
    print('Incorrect')

Incorrect


`elif` (Personally i read this as else if) is used if there are more than 2 possible outcomes. This makes more sense when we use it in conjunction with a `for loop`.

In [59]:
if 1<2:
    print('Wrong')
elif 1 == 1:
    print('Correct')
else:
    print('Incorrect')

Wrong


## for Loops

In [60]:
first_integers = [1,2,3,4,5,6,7,8,9,10]

In [61]:
for i in first_integers:
    print(i)

1
2
3
4
5
6
7
8
9
10


### for loops and Conditional Statements (if, elif and else)

In [62]:
for i in first_integers:
    if i % 2 == 0:
        print(i,'is Even')
    else:
        print(i, 'is Odd')

1 is Odd
2 is Even
3 is Odd
4 is Even
5 is Odd
6 is Even
7 is Odd
8 is Even
9 is Odd
10 is Even


In [63]:
Names = ['Benjamin','Vincent','Albert']

In [64]:
for i in Names:
    if i == 'Benjamin':
        print(i+"'s",'Age:',23)
    elif i == 'Vincent':
        print(i+"'s",'Age:',24)
    else:
        print(i+"'s",'Age:',25)

Benjamin's Age: 23
Vincent's Age: 24
Albert's Age: 25


### List Comprehension

In [65]:
Age = []

for i in Names:
    if i == 'Benjamin':
        Age.append(23)
    elif i == 'Vincent':
        Age.append(24)
    else:
        Age.append(25)

Age

[23, 24, 25]

### range()

In [66]:
for i in range(1,10):
    print(i)

1
2
3
4
5
6
7
8
9


## while loops

An example data science usage is with Gradient Descent, which is an iterative algorithm for minimizing a function.

In [67]:
i = 1
while i < 5:
    print('i is: {}'.format(i))
    i = i+1

i is: 1
i is: 2
i is: 3
i is: 4


## Functions

In [68]:
def is_even(number):
    if number % 2 == 0:
        return 'The number is Even'
    else:
        return 'The number is Odd'

In [69]:
is_even(5)

'The number is Odd'

In [70]:
def dot(a,b):
    product = []
    for x, y in zip(a, b):
        product.append(x*y)
    return sum(product)

In [71]:
feature = [10,5,6,2.3]
weights = [1,2,3,4]
dot(feature,weights)

47.2

## Lambda Expressions

- 'Shortcut' for a function

In [72]:
def square_number(number):
    return number**2

In [73]:
square_number(10)

100

In [74]:
square = lambda x: x**2
square(10)

100

## map()

- Applies a function to every element in a list.

In [75]:
numbers = [10,9,100,12,17,15,23]
list(map(is_even,numbers))

['The number is Even',
 'The number is Odd',
 'The number is Even',
 'The number is Even',
 'The number is Odd',
 'The number is Odd',
 'The number is Odd']

## Some Methods

In [76]:
text_1 = 'This is Ricardo'

In [77]:
text_1.upper()

'THIS IS RICARDO'

In [78]:
text_1.lower()

'this is ricardo'

In [79]:
text_1.split(' ')

['This', 'is', 'Ricardo']

## Try - Except (Try Catch)

In [80]:
a = [1,2,3]
b = [1,0,1]

for x, y in zip(a, b):
    print(x/y)

1.0


ZeroDivisionError: division by zero

In [81]:
a = [10,23,16]
b = [22,0,87]

for x, y in zip(a, b):
    try:
        print(x/y)
    except:
        print('Can not Divide by 0')

0.45454545454545453
Can not Divide by 0
0.1839080459770115


# Quick Introduction to Pandas

Note that the data should be in the same directory as the notebook. Otherwise, you need to specify the directory.

In [82]:
import pandas as pd

In [83]:
sales = pd.read_csv('sales.csv')

In [84]:
sales.head()

Unnamed: 0,Sales,Advertising,Season
0,4812.913689,879.259676,1
1,5463.871221,978.340501,1
2,5046.05903,910.952362,1
3,5177.868662,953.757955,1
4,5475.161549,942.513891,0


In [85]:
sales.head(8)

Unnamed: 0,Sales,Advertising,Season
0,4812.913689,879.259676,1
1,5463.871221,978.340501,1
2,5046.05903,910.952362,1
3,5177.868662,953.757955,1
4,5475.161549,942.513891,0
5,5004.630187,1242.710606,1
6,4875.09572,990.050302,1
7,5225.872981,996.908775,0


In [86]:
sales.tail()

Unnamed: 0,Sales,Advertising,Season
495,5457.961798,985.72051,1
496,4862.519668,965.562369,1
497,5352.141958,941.423624,1
498,4710.272718,992.114575,1
499,5178.361756,1098.57344,1


In [87]:
sales.describe()

Unnamed: 0,Sales,Advertising,Season
count,500.0,500.0,500.0
mean,5008.34787,1000.189637,0.808
std,254.481174,100.58262,0.394268
min,4258.043387,724.653163,0.0
25%,4827.496565,936.300676,1.0
50%,5007.672225,1003.362844,1.0
75%,5175.15615,1067.867887,1.0
max,5793.442845,1290.178827,1.0


In [88]:
sales['Advertising'] > 950

0      False
1       True
2      False
3       True
4      False
       ...  
495     True
496     True
497    False
498     True
499     True
Name: Advertising, Length: 500, dtype: bool

In [89]:
sales[sales['Advertising'] > 950]

Unnamed: 0,Sales,Advertising,Season
1,5463.871221,978.340501,1
3,5177.868662,953.757955,1
5,5004.630187,1242.710606,1
6,4875.095720,990.050302,1
7,5225.872981,996.908775,0
...,...,...,...
494,5062.386931,1062.877178,1
495,5457.961798,985.720510,1
496,4862.519668,965.562369,1
498,4710.272718,992.114575,1


In [90]:
sales[(sales['Advertising'] > 950) & ( sales['Season'] == 1 )]

Unnamed: 0,Sales,Advertising,Season
1,5463.871221,978.340501,1
3,5177.868662,953.757955,1
5,5004.630187,1242.710606,1
6,4875.095720,990.050302,1
8,5007.715626,1105.695195,1
...,...,...,...
494,5062.386931,1062.877178,1
495,5457.961798,985.720510,1
496,4862.519668,965.562369,1
498,4710.272718,992.114575,1


In [91]:
pd.pivot_table(index='Season',values=['Sales','Advertising'],aggfunc=sum,data=sales)

Unnamed: 0_level_0,Advertising,Sales
Season,Unnamed: 1_level_1,Unnamed: 2_level_1
0,95613.325998,478108.1
1,404481.492376,2026066.0


In [92]:
pd.pivot_table(index='Season',values=['Advertising'],aggfunc=sum,data=sales)

Unnamed: 0_level_0,Advertising
Season,Unnamed: 1_level_1
0,95613.325998
1,404481.492376
