## PYTHON FOR DATA SCIENCE

### **Pandas**

One-dimensional ndarray with axis labels (including time series).

The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. If there are no matching labels during alignment, pandas returns **NaN (not any number)** so that the operation does not fail.

The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals.

Read Pandas documentations: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html

### Import Library

In [3]:
import pandas as pd
import numpy as np

**Series** is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.)

**Dataframe** is a two-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.)

You can see it as Series being a one-dimensional labeled array while DataFrame is a group of series

In [4]:
my_pets = ['Lion', 'Cat', 'Birds', 'Fish']
my_pets

['Lion', 'Cat', 'Birds', 'Fish']

In [6]:
type(my_pets)

list

In [12]:
#Converting our list to series objects with indexes
pd.Series(my_pets)

0     Lion
1      Cat
2    Birds
3     Fish
dtype: object

**We can create our own index**

In [13]:
my_days=['Monday','Tuesday','Wednesday','Thursday','Friday']

In [14]:
my_courses=['Economics','Geograghy','Finance','Mathematics','History']

In [15]:
len(my_days)

5

In [16]:
len(my_courses)

5

In [17]:
pd.Series(my_courses)

0      Economics
1      Geograghy
2        Finance
3    Mathematics
4        History
dtype: object

In [18]:
pd.Series(my_courses, index=my_days)

Monday         Economics
Tuesday        Geograghy
Wednesday        Finance
Thursday     Mathematics
Friday           History
dtype: object

In [19]:
my_days1=['Monday','Tuesday','Wednesday','Thursday','Friday']

In [20]:
len(my_days1)

5

In [21]:
len(my_courses)

5

In [22]:
pd.Series(my_courses, index=my_days1)
# gives error because the lenght of the values are not the same as legth of the index

Monday         Economics
Tuesday        Geograghy
Wednesday        Finance
Thursday     Mathematics
Friday           History
dtype: object

In [23]:
days = pd.Series(['Monday','Tuesday','Wednesday','Thursday','Friday'],
                 index =['Day1','Day2','Day3','Day4','Day5']) 
days

Day1       Monday
Day2      Tuesday
Day3    Wednesday
Day4     Thursday
Day5       Friday
dtype: object

In [24]:
courses= pd.Series(['Economics','Geograghy','Finance','Mathematics','History'],
                   index ='Day1 Day2 Day3 Day4 Day5'.split())
# You can also use split function 
courses

Day1      Economics
Day2      Geograghy
Day3        Finance
Day4    Mathematics
Day5        History
dtype: object

In [25]:
days + courses # addition(concatenation) based on index


Day1        MondayEconomics
Day2       TuesdayGeograghy
Day3       WednesdayFinance
Day4    ThursdayMathematics
Day5          FridayHistory
dtype: object

In [26]:
days + ' ' + courses # addition(concatenation) based on index


Day1        Monday Economics
Day2       Tuesday Geograghy
Day3       Wednesday Finance
Day4    Thursday Mathematics
Day5          Friday History
dtype: object

In [27]:
days

Day1       Monday
Day2      Tuesday
Day3    Wednesday
Day4     Thursday
Day5       Friday
dtype: object

courses

#### NB: You can also use merge

In [29]:
courses

Day1      Economics
Day2      Geograghy
Day3        Finance
Day4    Mathematics
Day5        History
dtype: object

In [30]:
courses['Day5']

'History'

In [31]:
days

Day1       Monday
Day2      Tuesday
Day3    Wednesday
Day4     Thursday
Day5       Friday
dtype: object

In [33]:
days['Day3']

'Wednesday'

### loc & iloc

**lo**c gets rows (or columns) with particular labels (name) from the index.

**iloc** gets rows (or columns) at particular positions in the index and it takes integers.

In [36]:
# creating a dictionary

sports = {'Football': 'Spain',
          'NBA': 'USA',
          'Cricket': 'India',
          'Athelets': 'Jamaica'}


sports_series = pd.Series(sports)
sports_series

Football      Spain
NBA             USA
Cricket       India
Athelets    Jamaica
dtype: object

In [37]:
sports_series.loc['Cricket']

'India'

In [38]:
sports_series.iloc[2]

'India'

In [39]:
sports_series[3]

'Jamaica'

# DataFrames

Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.

Read Pandas DataFrame documentation: https://bit.ly/2Ufe2BJ

![download.jpg](attachment:e7c311ac-02f2-4c5f-8ecd-2f6e02108f63.jpg)

In [40]:
np.random.randn(10,5) # 10x5

array([[ 0.05320698,  0.0952199 ,  0.15771375, -1.4089969 , -0.221591  ],
       [ 0.50948987,  2.53975395, -1.47987841,  0.65349619, -0.58946446],
       [ 0.9310261 , -1.4694163 ,  0.75604795, -0.38078823,  0.33287049],
       [ 0.46791381,  0.1964594 ,  0.95223106,  0.41689602, -0.51014984],
       [-1.45550141,  0.49763331,  1.57893289,  1.28329258, -0.12851765],
       [ 0.49121612,  1.50576232,  0.21941392, -0.53971796,  0.12593189],
       [-0.17081161, -0.95159369, -0.30608548, -0.32200722, -0.20844686],
       [-0.51095765,  0.45270353, -0.91311695, -0.29432077, -1.00018244],
       [-0.10101773,  0.74749863,  0.06732175, -0.32408945,  0.86734336],
       [-1.37794008,  0.74155046, -1.4035594 , -1.27500051, -0.30358625]])