This is a test

This is test 2

## **Data manipulation and processing**

* Create dataframe using Pandas
* Perform row level manipulations such as drop rows and columns
* Fill in Null values
* GroupBy and use **Describe** for dataframe
* Concatenating and Merging dataframes

### **Create dataframe using Pandas**

In [None]:
# import libraries

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Create a dataframe called df

df = pd.DataFrame({'A':[1,2,np.nan],'B':[5,np.nan,np.nan],'C':[1,2,3]})
df['States']="CA NV AZ".split()
df.set_index('States',inplace=True)
print(df)

          A    B  C
States             
CA      1.0  5.0  1
NV      2.0  NaN  2
AZ      NaN  NaN  3


### **Perform row level manipulations such as drop rows and columns**

In [None]:
print("\nDropping any rows with a NaN value\n",'-'*35, sep='')
print(df.dropna(axis=0))



Dropping any rows with a NaN value
-----------------------------------
          A    B  C
States             
CA      1.0  5.0  1


In [None]:
print("\nDropping any column with a NaN value\n",'-'*35, sep='')
print(df.dropna(axis=1))



Dropping any column with a NaN value
-----------------------------------
        C
States   
CA      1
NV      2
AZ      3


In [None]:
print("\nDropping a row with a minimum 2 NaN value using 'thresh' parameter\n",'-'*68, sep='')
print(df.dropna(axis=0, thresh=4))


Dropping a row with a minimum 2 NaN value using 'thresh' parameter
--------------------------------------------------------------------
Empty DataFrame
Columns: [A, B, C]
Index: []


### **Fill in Null values**

In [None]:
print("\nFilling values with a default value\n",'-'*35, sep='')
print(df.fillna(value='FILL VALUE')) # You can include any string value here to fill null values



Filling values with a default value
-----------------------------------
                 A           B  C
States                           
CA               1           5  1
NV               2  FILL VALUE  2
AZ      FILL VALUE  FILL VALUE  3


In [None]:
print("\nFilling values with a computed value (mean of column A here)\n",'-'*60, sep='')
print(df.fillna(value=df['A'].mean()))


Filling values with a computed value (mean of column A here)
------------------------------------------------------------
          A    B  C
States             
CA      1.0  5.0  1
NV      2.0  1.5  2
AZ      1.5  1.5  3


### **GroupBy** and use **Describe** for dataframe

In [None]:
# Create dataframe
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}
df = pd.DataFrame(data)
df

Unnamed: 0,Company,Person,Sales
0,GOOG,Sam,200
1,GOOG,Charlie,120
2,MSFT,Amy,340
3,MSFT,Vanessa,124
4,FB,Carl,243
5,FB,Sarah,350


In [None]:
byComp = df.groupby('Company')
print("\nGrouping by 'Company' column and listing mean sales\n",'-'*55, sep='')
print(byComp.mean())



Grouping by 'Company' column and listing mean sales
-------------------------------------------------------
         Sales
Company       
FB       296.5
GOOG     160.0
MSFT     232.0


In [None]:
print("\nGrouping by 'Company' column and listing sum of sales\n",'-'*55, sep='')
print(byComp.sum())



Grouping by 'Company' column and listing sum of sales
-------------------------------------------------------
         Sales
Company       
FB         593
GOOG       320
MSFT       464


In [None]:
print("\nAll in one line of command (Stats for 'FB')\n",'-'*65, sep='')
print(pd.DataFrame(df.groupby('Company').describe().loc['FB']).transpose())



All in one line of command (Stats for 'FB')
-----------------------------------------------------------------
   Sales                                                       
   count   mean        std    min     25%    50%     75%    max
FB   2.0  296.5  75.660426  243.0  269.75  296.5  323.25  350.0


In [None]:
(pd.DataFrame(df.groupby('Company').describe().loc['FB'])).transpose()

Unnamed: 0_level_0,Sales,Sales,Sales,Sales,Sales,Sales,Sales,Sales
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
FB,2.0,296.5,75.660426,243.0,269.75,296.5,323.25,350.0


In [None]:
(pd.DataFrame(df.groupby('Company').describe().loc['FB'])).transpose()

Unnamed: 0_level_0,Sales,Sales,Sales,Sales,Sales,Sales,Sales,Sales
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max
FB,2.0,296.5,75.660426,243.0,269.75,296.5,323.25,350.0


In [None]:
print("\nSame type of extraction with little different command\n",'-'*68, sep='')
print(df.groupby('Company').describe().loc[['GOOG', 'MSFT']])


Same type of extraction with little different command
--------------------------------------------------------------------
        Sales                                                      
        count   mean         std    min    25%    50%    75%    max
Company                                                            
GOOG      2.0  160.0   56.568542  120.0  140.0  160.0  180.0  200.0
MSFT      2.0  232.0  152.735065  124.0  178.0  232.0  286.0  340.0


### **Merge dataframes**

In [None]:
# Merging two data frames
# Creating data frames
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']},
                        index=[0, 1, 2, 3])


In [None]:
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [None]:
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                        'B': ['B4', 'B5', 'B6', 'B7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],
                        'D': ['D4', 'D5', 'D6', 'D7']},
                         index=[0, 1, 2, 3])


In [None]:
df2

Unnamed: 0,A,B,C,D
0,A4,B4,C4,D4
1,A5,B5,C5,D5
2,A6,B6,C6,D6
3,A7,B7,C7,D7


In [None]:
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                        'B': ['B8', 'B9', 'B10', 'B11'],
                        'C': ['C8', 'C9', 'C10', 'C11'],
                        'D': ['D8', 'D9', 'D10', 'D11']},
                        index=[8,9,10,11])


In [None]:
df3

Unnamed: 0,A,B,C,D
8,A8,B8,C8,D8
9,A9,B9,C9,D9
10,A10,B10,C10,D10
11,A11,B11,C11,D11


In [None]:
print("\nThe DataFrame number 1\n",'-'*30, sep='')
print(df1)



The DataFrame number 1
------------------------------
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3


In [None]:
print("\nThe DataFrame number 2\n",'-'*30, sep='')
print(df2)



The DataFrame number 2
------------------------------
    A   B   C   D
0  A4  B4  C4  D4
1  A5  B5  C5  D5
2  A6  B6  C6  D6
3  A7  B7  C7  D7


In [None]:
print("\nThe DataFrame number 3\n",'-'*30, sep='')
print(df3)


The DataFrame number 3
------------------------------
      A    B    C    D
8    A8   B8   C8   D8
9    A9   B9   C9   D9
10  A10  B10  C10  D10
11  A11  B11  C11  D11


In [None]:
#concatenation
df_cat1 = pd.concat([df1,df2,df3], axis=0)
print("\nAfter concatenation along row\n",'-'*30, sep='')
print(df_cat1)
df_cat1.loc[2]


After concatenation along row
------------------------------
      A    B    C    D
0    A0   B0   C0   D0
1    A1   B1   C1   D1
2    A2   B2   C2   D2
3    A3   B3   C3   D3
0    A4   B4   C4   D4
1    A5   B5   C5   D5
2    A6   B6   C6   D6
3    A7   B7   C7   D7
8    A8   B8   C8   D8
9    A9   B9   C9   D9
10  A10  B10  C10  D10
11  A11  B11  C11  D11


Unnamed: 0,A,B,C,D
2,A2,B2,C2,D2
2,A6,B6,C6,D6


In [None]:
df_cat1.iloc[4]

A    A4
B    B4
C    C4
D    D4
Name: 0, dtype: object

In [None]:
df_cat2 = pd.concat([df1,df2,df3], axis=1)
print("\nAfter concatenation along column\n",'-'*60, sep='')
print(df_cat2)



After concatenation along column
------------------------------------------------------------
      A    B    C    D    A    B    C    D    A    B    C    D
0    A0   B0   C0   D0   A4   B4   C4   D4  NaN  NaN  NaN  NaN
1    A1   B1   C1   D1   A5   B5   C5   D5  NaN  NaN  NaN  NaN
2    A2   B2   C2   D2   A6   B6   C6   D6  NaN  NaN  NaN  NaN
3    A3   B3   C3   D3   A7   B7   C7   D7  NaN  NaN  NaN  NaN
8   NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   A8   B8   C8   D8
9   NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   A9   B9   C9   D9
10  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  A10  B10  C10  D10
11  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  A11  B11  C11  D11


In [None]:
df_cat2.fillna(value=0, inplace=True)
print("\nAfter filling missing values with zero\n",'-'*60, sep='')
print(df_cat2)


After filling missing values with zero
------------------------------------------------------------
     A   B   C   D   A   B   C   D    A    B    C    D
0   A0  B0  C0  D0  A4  B4  C4  D4    0    0    0    0
1   A1  B1  C1  D1  A5  B5  C5  D5    0    0    0    0
2   A2  B2  C2  D2  A6  B6  C6  D6    0    0    0    0
3   A3  B3  C3  D3  A7  B7  C7  D7    0    0    0    0
8    0   0   0   0   0   0   0   0   A8   B8   C8   D8
9    0   0   0   0   0   0   0   0   A9   B9   C9   D9
10   0   0   0   0   0   0   0   0  A10  B10  C10  D10
11   0   0   0   0   0   0   0   0  A11  B11  C11  D11


In [None]:
# merging by a common key

In [None]:
left = pd.DataFrame({'key': ['K0', 'K8', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                          'C': ['C0', 'C1', 'C2', 'C3'],
                          'D': ['D0', 'D1', 'D2', 'D3']})


In [None]:
left

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K8,A1,B1
2,K2,A2,B2
3,K3,A3,B3


In [None]:
right

Unnamed: 0,key,C,D
0,K0,C0,D0
1,K1,C1,D1
2,K2,C2,D2
3,K3,C3,D3


In [None]:
print("\nThe DataFrame 'left'\n",'-'*30, sep='')
print(left)



The DataFrame 'left'
------------------------------
  key   A   B
0  K0  A0  B0
1  K8  A1  B1
2  K2  A2  B2
3  K3  A3  B3


In [None]:
print("\nThe DataFrame 'right'\n",'-'*30, sep='')
print(right)


The DataFrame 'right'
------------------------------
  key   C   D
0  K0  C0  D0
1  K1  C1  D1
2  K2  C2  D2
3  K3  C3  D3


In [None]:
merge1= pd.merge(left,right,how='inner',on='key')
print("\nAfter simple merging with 'inner' method\n",'-'*50, sep='')
print(merge1)


After simple merging with 'inner' method
--------------------------------------------------
  key   A   B   C   D
0  K0  A0  B0  C0  D0
1  K2  A2  B2  C2  D2
2  K3  A3  B3  C3  D3


In [None]:
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                               'key2': ['K0', 'K0', 'K0', 'K0'],
                                  'C': ['C0', 'C1', 'C2', 'C3'],
                                  'D': ['D0', 'D1', 'D2', 'D3']})

In [None]:
left

Unnamed: 0,key1,key2,A,B
0,K0,K0,A0,B0
1,K0,K1,A1,B1
2,K1,K0,A2,B2
3,K2,K1,A3,B3


In [None]:
right

Unnamed: 0,key1,key2,C,D
0,K0,K0,C0,D0
1,K1,K0,C1,D1
2,K1,K0,C2,D2
3,K2,K0,C3,D3


In [None]:
pd.merge(left, right, on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2


In [None]:
pd.merge(left, right, how='left',on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K0,K1,A1,B1,,
2,K1,K0,A2,B2,C1,D1
3,K1,K0,A2,B2,C2,D2
4,K2,K1,A3,B3,,


In [None]:
pd.merge(left, right, how='right',on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2
3,K2,K0,,,C3,D3


In [None]:
#join operators
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2'])

right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                    'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])

In [None]:
left

Unnamed: 0,A,B
K0,A0,B0
K1,A1,B1
K2,A2,B2


In [None]:
right

Unnamed: 0,C,D
K0,C0,D0
K2,C2,D2
K3,C3,D3


In [None]:
left.join(right)

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2


In [None]:
left.join(right, how='outer')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2
K3,,,C3,D3


In [None]:
# use of apply functions

In [None]:
# Define a function
def testfunc(x):
    if (x> 500):
        return (10*np.log10(x))
    else:
        return (x/10)

In [None]:
df = pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10],
                   'col2':[444,555,666,444,333,222,666,777,666,555],
                   'col3':'aaa bb c dd eeee fff gg h iii j'.split()})
df

Unnamed: 0,col1,col2,col3
0,1,444,aaa
1,2,555,bb
2,3,666,c
3,4,444,dd
4,5,333,eeee
5,6,222,fff
6,7,666,gg
7,8,777,h
8,9,666,iii
9,10,555,j


In [None]:
df['FuncApplied'] = df['col2'].apply(lambda x : np.log(x))
print(df)

   col1  col2  col3  FuncApplied
0     1   444   aaa     6.095825
1     2   555    bb     6.318968
2     3   666     c     6.501290
3     4   444    dd     6.095825
4     5   333  eeee     5.808142
5     6   222   fff     5.402677
6     7   666    gg     6.501290
7     8   777     h     6.655440
8     9   666   iii     6.501290
9    10   555     j     6.318968


In [None]:
df['col3length']= df['col3'].apply(len)
print(df)

   col1  col2  col3  FuncApplied  col3length
0     1   444   aaa     6.095825           3
1     2   555    bb     6.318968           2
2     3   666     c     6.501290           1
3     4   444    dd     6.095825           2
4     5   333  eeee     5.808142           4
5     6   222   fff     5.402677           3
6     7   666    gg     6.501290           2
7     8   777     h     6.655440           1
8     9   666   iii     6.501290           3
9    10   555     j     6.318968           1


In [None]:
df['FuncApplied'].apply(lambda x: np.sqrt(x))

0    2.468972
1    2.513756
2    2.549763
3    2.468972
4    2.410009
5    2.324366
6    2.549763
7    2.579814
8    2.549763
9    2.513756
Name: FuncApplied, dtype: float64

In [None]:
print("\nSum of the column 'FuncApplied' is: ",df['FuncApplied'].sum())



Sum of the column 'FuncApplied' is:  62.19971458619886


In [None]:
print("Mean of the column 'FuncApplied' is: ",df['FuncApplied'].mean())


Mean of the column 'FuncApplied' is:  6.219971458619886


In [None]:
print("Std dev of the column 'FuncApplied' is: ",df['FuncApplied'].std())


Std dev of the column 'FuncApplied' is:  0.3822522801574853


In [None]:
print("Min and max of the column 'FuncApplied' are: ",df['FuncApplied'].min(),"and",df['FuncApplied'].max())

Min and max of the column 'FuncApplied' are:  5.402677381872279 and 6.655440350367647


In [None]:
### Deletion, sorting, list of column and row names

In [None]:
print("\nName of columns\n",'-'*20, sep='')
print(df.columns)



Name of columns
--------------------
Index(['col1', 'col2', 'col3', 'FuncApplied', 'col3length'], dtype='object')


In [None]:
l = list(df.columns)
print("\nColumn names in a list of strings for later manipulation:",l)


Column names in a list of strings for later manipulation: ['col1', 'col2', 'col3', 'FuncApplied', 'col3length']


In [None]:
print("\nDeleting last column by 'del' command\n",'-'*50, sep='')
del df['col3length']
print(df)
df['col3length']= df['col3'].apply(len)


Deleting last column by 'del' command
--------------------------------------------------
   col1  col2  col3  FuncApplied
0     1   444   aaa     6.095825
1     2   555    bb     6.318968
2     3   666     c     6.501290
3     4   444    dd     6.095825
4     5   333  eeee     5.808142
5     6   222   fff     5.402677
6     7   666    gg     6.501290
7     8   777     h     6.655440
8     9   666   iii     6.501290
9    10   555     j     6.318968


In [None]:
df.sort_values(by='col2') #inplace=False by default

Unnamed: 0,col1,col2,col3,FuncApplied,col3length
5,6,222,fff,5.402677,3
4,5,333,eeee,5.808142,4
0,1,444,aaa,6.095825,3
3,4,444,dd,6.095825,2
1,2,555,bb,6.318968,2
9,10,555,j,6.318968,1
2,3,666,c,6.50129,1
6,7,666,gg,6.50129,2
8,9,666,iii,6.50129,3
7,8,777,h,6.65544,1


In [None]:
df.sort_values(by='FuncApplied',ascending=False) #inplace=False by default

Unnamed: 0,col1,col2,col3,FuncApplied,col3length
7,8,777,h,6.65544,1
2,3,666,c,6.50129,1
6,7,666,gg,6.50129,2
8,9,666,iii,6.50129,3
1,2,555,bb,6.318968,2
9,10,555,j,6.318968,1
0,1,444,aaa,6.095825,3
3,4,444,dd,6.095825,2
4,5,333,eeee,5.808142,4
5,6,222,fff,5.402677,3


In [None]:
df = pd.DataFrame({'col1':[1,2,3,np.nan],
                   'col2':[None,555,666,444],
                   'col3':['abc','def','ghi','xyz']})
df.head()

Unnamed: 0,col1,col2,col3
0,1.0,,abc
1,2.0,555.0,def
2,3.0,666.0,ghi
3,,444.0,xyz


In [None]:
df.isnull()

Unnamed: 0,col1,col2,col3
0,False,True,False
1,False,False,False
2,False,False,False
3,True,False,False


In [None]:
df.fillna('FILL')

Unnamed: 0,col1,col2,col3
0,1,FILL,abc
1,2,555,def
2,3,666,ghi
3,FILL,444,xyz


In [None]:
df1


Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [None]:
df2

Unnamed: 0,A,B,C,D
0,A4,B4,C4,D4
1,A5,B5,C5,D5
2,A6,B6,C6,D6
3,A7,B7,C7,D7


In [None]:
df3

Unnamed: 0,A,B,C,D
8,A8,B8,C8,D8
9,A9,B9,C9,D9
10,A10,B10,C10,D10
11,A11,B11,C11,D11


In [None]:
pd.merge(df1, df2, how='inner')

Unnamed: 0,A,B,C,D


In [None]:
pd.merge(df1, df2, how='outer')

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [None]:
pd.merge(df1, df2, how='left')

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [None]:
pd.merge(df1, df2, how='right')

Unnamed: 0,A,B,C,D
0,A4,B4,C4,D4
1,A5,B5,C5,D5
2,A6,B6,C6,D6
3,A7,B7,C7,D7


## **Python container objects and functions** for data processing

In this session, we are going to learn the following key topics:
- Tuples
- Set
- Dictionary
- Function
- Lambda
- Iterator
- Generator
- Map
- Reduce
- Filter

### Tuples

In Python, tuples are similar to lists but they are immutable i.e. they cannot be changed. You would use the tuples to present data that shouldn't be changed, such as days of week or dates on  a calendar.

In this section, we will get a brief overview of the following key topics:

    1.) Constructing Tuples
    2.) Basic Tuple Methods
    3.) Immutability
    4.) When to Use Tuples

You'll have an intuition of how to use tuples based on what you've learned about lists. But, Tuples work very similar to lists but the  major difference is tuples are immutable.

## Constructing Tuples

The construction of tuples use () with elements separated by commas where in the arguments will be passed within brackets. For example:

In [None]:
# Can create a tuple with mixed types
t = (1,2,3)

In [None]:
# Check len just like a list
type(t)

tuple

In [None]:
# Can also mix object types
t = ('one',2)

# Show
l = ['sdf','sf']
l.sort()
l

['sdf', 'sf']

In [None]:
# Use indexing just like we did in lists
t[0]

'one'

In [None]:
# Slicing just like a list
t[-1]

2

#### Basic Tuple Methods

Tuples have built-in methods, but not as many as lists do. Let's see two samples of tuple built-in methods:

In [None]:
# Use .index to enter a value and return the index
t.index(89)

ValueError: tuple.index(x): x not in tuple

In [None]:
# Use .count to count the number of times a value appears
t.count('one')

1

#### Immutability

As tuples are immutable, it can't be stressed enough and add more into it. To drive that point home:

In [None]:
t[0]= 'change'

TypeError: 'tuple' object does not support item assignment

Because tuple being immutable they can't grow. Once a tuple is made we can not add to it.

In [None]:
t.append('nope')

AttributeError: 'tuple' object has no attribute 'append'

#### When to use Tuples

You may be wondering, "Why to bother using tuples when they have a few available methods?"

Tuples are not used often as lists in programming but are used when immutability is necessary. While you are passing around an object and if you need to make sure that it does not get changed then tuple become your solution. It provides a convenient source of data integrity.

You should now be able to create and use tuples in your programming as well as have a complete understanding of their immutability.

### Sets

Sets are an unordered collection of *unique* elements which can be constructed using the set() function.

Let's go ahead and create a set to see how it works.

In [None]:
x = set()

In [None]:
# We add to sets with the add() method
x.add(3)

In [None]:
#Show
x

{1, 2, 3, 6, 9}

Note that the curly brackets do not indicate a dictionary! Using only keys, you can draw analogies as a set being a dictionary.

We know that a set has an only unique entry. Now, let us see what happens when we try to add something more that is already present in a set?

In [None]:
# Add a different element
x.add(2)

In [None]:
#Show
x

{1, 2, 6, 9}

In [None]:
# Try to add the same element
x.add(1)

In [None]:
#Show
x

{1, 2, 6, 9}

Notice, how it won't place another 1 there as a set is only concerned with unique elements! However, We can cast a list with multiple repeat elements to a set to get the unique elements. For example:

In [None]:
# Create a list with repeats
l = [1,1,2,2,3,4,5,6,1,1]

In [None]:
# Cast as set to get unique values
set(l)

{1, 2, 3, 4, 5, 6}

### Dictionaries

We have learned about "Sequences" in the previous session. Now, let's switch the gears and learn about "mappings" in Python. These dictionaries are nothing but hash tables in other programming languages.

In this section, we will learn briefly about an introduction to dictionaries and what it consists of:

    1.) Constructing a Dictionary
    2.) Accessing objects from a Dictionary
    3.) Nesting Dictionaries
    4.) Basic Dictionary Methods

Before we dive deep into this concept, let's understand what are Mappings?

Mappings are a collection of objects that are stored by a "key". Unlike a sequence, mapping store objects by their relative position. This is an important distinction since mappings won't retain the order since they have objects defined by a key.

A Python dictionary consists of a key and then an associated value. That value can be almost any Python object.


## Constructing a Dictionary
Let's see how we can construct dictionaries to get a better understanding of how they work!

In [None]:
# Make a dictionary with {} and : to signify a key and a value
my_dict = {True:'value1','key2':'value2','key1':'valuedfvdfg','key1':'abc'}
my_dict

{True: 'value1', 'key2': 'value2', 'key1': 'abc'}

In [None]:
# Call values by their key
my_dict['key2']

'value2'

Note that dictionaries are very flexible in the data types they can hold. For example:

In [None]:
my_dict = {'key1':123,'key2':[12,23,33],'key3':['item0','item1','item2']}

In [None]:
#Let's call items from the dictionary
my_dict['key2'][2]

33

In [None]:
# Can call an index on that value
my_dict['key3'][0]

'item0'

In [None]:
#Can then even call methods on that value
my_dict['key3'][0].upper()

'ITEM0'

We can effect the values of a key as well. For instance:

In [None]:
my_dict['key1']

123

In [None]:
# Subtract 123 from the value
my_dict['key1'] = my_dict['key1'] - 123

In [None]:
#Check
my_dict['key1']

0

Note, Python has a built-in method of doing a self subtraction or addition (or multiplication or division). We could also use += or -= for the above statement. For example:

In [None]:
# Set the object equal to itself minus 123
my_dict['key1'] -= 123
my_dict['key1']

-123

We can also create keys by assignment. For instance if we started off with an empty dictionary, we could continually add to it:

In [None]:
# Create a new dictionary
d = {}
type(d)

dict

In [None]:
# Create a new key through assignment
d['animal'] = 'xyz'
d

{'animal': 'xyz', 'answer': 42}

In [None]:
# Can do this with any object
d['answer'] = 42

In [None]:
#Show
d

{'animal': 'Dog', 'answer': 42}

#### Nesting with Dictionaries

Let's understand how flexible Python is with nesting objects and calling methods on them. let's have a look at the dictionary nested inside a dictionary:

In [None]:
# Dictionary nested inside a dictionary nested in side a dictionary
d = {'key1':{'nestkey':{'subnestkey':'value'}}}

Thats the inception of dictionaries. Now, Let's see how we can grab that value:

In [None]:
# Keep calling the keys
d['key1']['nestkey']

{'subnestkey': 'value'}

#### A few Dictionary Methods

There are a few methods we can call on a dictionary. Let's get a quick introduction to a few methods:

In [None]:
# Create a typical dictionary
d = {'key1':1,'key2':2,'key3':3}

In [None]:
# Method to return a list of all keys
f=d.keys()
list(f)[0]

'key1'

In [None]:
# Method to grab all values
type(d.values())

dict_values

In [None]:
# Method to return tuples of all items  (we'll learn about tuples soon)
d.items()

[('key3', 3), ('key2', 2), ('key1', 1)]

#### Dictionary Comprehensions

Just like List Comprehensions, Dictionary Data Types also support their own version of comprehension for quick creation. It is not as commonly used as List Comprehensions, but the syntax is:

In [None]:
{x:x**2 for x in range(10)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

One of the reasons is the difficulty in structuring the key names that are not based on the values.

### Functions

## Introduction to Functions

What is a function in Python and how to create a function?

Functions will be one of our main building blocks when we construct larger and larger amount of code to solve problems.

**So what is a function?**

A function groups a set of statements together to run the statements more than once. It allows us to specify parameters that can serve as inputs to the functions.

Functions allow us to reuse the code instead of writing the code again and again. If you recall strings and lists, remember that len() function is used to find the length of a string. Since checking the length of a sequence is a common task, you would want to write a function that can do this repeatedly at command.

Function is one of the most basic levels of reusing code in Python, and it will also allow us to start thinking of program design.

#### def Statements

Now, let us learn how to build a function and what is the syntax in Python.

The syntax for def statements will be in the following form:

In [None]:
def name_of_function(arg1,arg2):
    '''
    This is where the function's Document String (doc-string) goes
    '''
    # Do stuff here
    #return desired result

We begin with def then a space followed by the name of the function. Try to keep names relevant and simple as possible, for example, len() is a good name for a length() function. Also be careful with names, you wouldn't want to call a function the same name as a [built-in function in Python](https://docs.python.org/2/library/functions.html) (such as len).

Next, comes the number of arguments separated by a comma within a pair of parenthesis which acts as input to the defined function,  reference them and the function definition with a colon.  

Here comes the important step to indent to begin the code inside the defined functions properly. Also remember, Python makes use of *whitespace* to organize code and lot of other programming languages do not do this.

Next, you'll see the doc-string where you write the basic description of the function. Using iPython and iPython Notebooks, you'll be able to read these doc-strings by pressing Shift+Tab after a function name. It is not mandatory to include docstrings with simple functions, but it is a good practice to put them as this will help the programmers to easily understand the code you write.

After all this, you can begin writing the code you wish to execute.

The best way to learn functions is by going through examples. So let's try to analyze and understand examples that relate back to the various objects and data structures we learned.

#### Example 1: A simple print 'hello' function

In [None]:
def say_hello():
    print('hello')

Call the function

In [None]:
say_hello()

hello


#### Example 2: A simple greeting function
Let's write a function that greets people with their name.

In [None]:
def greeting(name):
    print('Hello %d' %name)

In [None]:
x

In [None]:
x = greeting(90)


Hello 90


In [None]:
x

#### Using return
Let's see some examples that use a return statement. Return allows a function to "return" a result that can then be stored as a variable, or used in whatever manner a user wants.

### Example 3: Addition function

In [None]:
def add_num(num1,num2):
    return num1+num2

In [None]:
add_num(4,5)

9

In [None]:
# Can also save as variable due to return
result = add_num(4,5)
result

9

In [None]:
print(result)

9


What happens if we input two strings?

In [None]:
print(add_num('one',1))

In Python we don't declare variable types, this function could be used to add numbers or sequences together! Going forward, We'll learn about adding in checks to make sure a user puts in the correct arguments into a function.

Let's also start using *break*,*continue*, and *pass* statements in our code. We introduced these during the while lecture.

Now, let's see a complete example of creating a function to check if a number is prime (a common interview exercise).

We know a number is said to be prime if that number is only divisible by 1 and itself. Let's write our first version of the function to check all the numbers from 1 to N and perform modulo checks.

In [None]:
def is_prime(num):
    '''
    Naive method of checking for primes.
    '''
    for n in range(2,num):
        if num % n == 0:
            print('not prime')
            break
    else: # If never mod zero, then prime
        print('prime')

In [None]:
is_prime(17)

prime


Note that how we break the code after the print statement! We can actually improve this by only checking to the square root of the target number, also we can disregard all even numbers after checking for 2. We'll also switch to returning a boolean value to get an example of using return statements:

In [None]:
import math

def is_prime(num):
    '''
    A Better method of checking for primes.
    '''
    if num % 2 == 0 and num > 2:
        return False
    for i in range(3, int(math.sqrt(num)) + 1, 2):
        if num % i == 0:
            return False
    return True

In [None]:
for i in []:
    if 2 % i == 0:
        print("false")
print("true")

true


## **Object Oriented Programming**

- Creating Classes
- Methods


__Object Oriented Programming (OOP)__ is a programming paradigm that allows abstraction through the concept of interacting entities. This programming works contradictory to conventional model and is procedural, in which programs are organized as a sequence of commands or statements to perform.

We can think an object as an entity that resides in memory, has a state and it's able to perform some actions.

More formally objects are entities that represent **instances** of a general abstract concept called **class**. In `Python`, "attributes" are the variables defining an object state and the possible actions are called "methods".

In Python, everything is an object also classes and functions.

### **How to define classes**

#### Creating a class

Suppose we want to create a class, named Person, as a prototype, a sort of template for any number of 'Person' objects (instances).

The following python syntax defines a class:

    class ClassName(base_classes):
        statements

        

Class names should always be uppercase (it's a naming convention).

Say we need to model a Person as:

* Name
* Surname  
* Age  

In [None]:
class Person:
    pass

john_doe = Person()
john_doe.name = "Alec"
john_doe.surname = "Baldwin"
john_doe.year_of_birth = 1958


print(john_doe)
print("%s %s was born in %d." %
      (john_doe.name, john_doe.surname, john_doe.year_of_birth))

<__main__.Person object at 0x7fc0fc759400>
Alec Baldwin was born in 1958.


In [None]:
class Person:
    pass

The following example defines an empty class (i.e. the class doesn't have a state) called _Person_ then creates a _Person_ instance called _john_doe_ and adds three attributes to _john_doe_. We see that we can access objects attributes using the "dot" operator.

This isn't a recommended style because classes should describe homogeneous entities. A way to do so is the following:

In [None]:
class Person:
    def __init__(self, name, surname, year_of_birth):
        self.name1 = name
        self.surname = surname
        self.year_of_birth = year_of_birth

    __init__(self, ...)
Is a special _Python_ method that is automatically called after an object construction. Its purpose is to initialize every object state. The first argument (by convention) __self__ is automatically passed either and refers to the object itself.

In the preceding example, `__init__` adds three attributes to every object that is instantiated. So the class is actually describing each object's state.


We cannot directly manipulate any class rather we need to create an instance of the class:

In [None]:
alec = Person("Alechgffh", "Baldwin", 1958)
print(alec)
print("%s %s was born in %d." % (alec.name1, alec.surname, alec.year_of_birth))

<__main__.Person object at 0x109dab128>
Alechgffh Baldwin was born in 1958.



We have just created an instance of the Person class, bound to the variable `alec`.

#### Methods

In [None]:
class Person:
    def __init__(a, name, surname, year_of_birth):
        a.name = name
        a.surname = surname
        a.year_of_birth = year_of_birth

    def age(a, current_year):
        return current_year - a.year_of_birth

    def __str__(a):
        return "%s %s was born in %d ." % (a.name, a.surname, a.year_of_birth)

alec = Person("Alec", "Baldwin", 1958)
print(alec)
print(alec.age(2014))


Alec Baldwin was born in 1958 .
56


We defined two more methods `age` and  `__str__`. The latter is once again a special method that is called by Python when the object has to be represented as a string (e.g. when has to be printed). If the `__str__` method isn't defined the **print** command shows the type of object and its address in memory. We can see that in order to call a method we use the same syntax for attributes (**instance_name.instance _method**).

#### Bad practice

It is possible to create a class without the `__init__` method, but this is not a recommended style because classes should describe homogeneous entities.

In [None]:
class Person:

    def set_name(self, name):
        self.name = name

    def set_surname(self, surname):
        self.surname = surname

    def set_year_of_birth(self, year_of_birth):
        self.year_of_birth = year_of_birth

    def age(self, current_year):
        return current_year - self.year_of_birth

    def __str__(self):
        return "%s %s was born in %d ." \
                % (self.name, self.surname, self.year_of_birth)


In this case, an empty instance of the class Person is created, and no attributes have been initialized while instantiating:

In [None]:
president = Person()

In [None]:
# This code will raise an attribute error:
print(president.name)

AttributeError: 'Person' object has no attribute 'name'

This raises an Attribute Error... We need to set the attributes:

In [None]:
president.set_name('John')
president.set_surname('Doe')
president.set_year_of_birth(1940)

In [None]:
print('Mr', president.name, president.surname,
      'is the president, and he is very old. He is',
      president.age(2014))

Mr John Doe is the president, and he is very old. He is 74
