# Deep dive into Pandas DataFrames 

**Read the official documentation on pandas DataFrames @ https://pandas.pydata.org/pandas-docs/stable/reference/frame.html**

**`Note:`** The notion of **chaining functions/methods** in pandas is similar to python.

DataFrames are **column oriented** unlike most common databases. And, **each column** in the dataframe is a **pandas series object**. So, any operation that can be performed on a pandas series object it can be applied to a column too.

There are **two axes** for a dataframe commonly referred to as axis 0 and 1, or the **"index"** (or 'rows') axis and the **"columns"** axis respectively. Note that, when an **operation** is applied **along axis 0**, it is applied **down the column**. Likewise, operations **along axis 1** operate **across the values in the row**.

--------------------
## Import Statements
--------------------------

In [2]:
# import statements
import numpy as np
import pandas as pd

In [16]:
# view options
pd.set_option("display.max_columns", 40)
pd.set_option("display.max_rows", 10)

---------------------------
### Importing the data
------------------------

We will be exploring a dataset from a Siena College Poll in 2018. This data has rankings of United States Presidents in various attributes.

In [17]:
# reading from github url
url = "https://github.com/mattharrison/datasets/raw/master/data/siena2018-pres.csv"
siena_2018 = pd.read_csv(url, index_col=0)

• Bg = Background
• Im = Imagination
• Int = Integrity
• IQ = Intelligence
• L = Luck
• WR = Willing to take risks
• AC = Ability to compromise
• EAb = Executive ability
• LA = Leadership ability
• CAb = Communication ability
• OA = Overall ability
• PL = Party leadership
• RC = Relations with Congress
• CAp = Court appointments
• HE = Handling of economy
• EAp = Executive appointments
• DA = Domestic accomplishments
• FPA = Foreign policy accomplishments
• AM = Avoid crucial mistakes
• EV = Experts’ view
• O = Overall

In [13]:
siena_2018.sample(4).sort_index()

Unnamed: 0,Seq.,President,Party,Bg,Im,Int,IQ,L,WR,AC,EAb,LA,CAb,OA,PL,RC,CAp,HE,EAp,DA,FPA,AM,EV,O
8,8,Martin Van Buren,Democratic,23,22,27,25,34,28,20,28,27,25,27,16,23,25,31,26,29,27,24,28,25
12,12,Zachary Taylor,Whig,30,26,22,32,37,24,26,26,25,32,32,35,32,37,27,33,27,30,26,30,30
22,22/24,Grover Cleveland,Democratic,26,23,26,27,19,27,22,19,20,19,22,20,27,20,21,23,23,21,15,22,23
25,26,Theodore Roosevelt,Republican,5,4,8,6,2,2,15,4,4,5,5,7,7,9,3,5,4,3,5,4,4


---------------
## Mathematical operations on DataFrames
----------------

**Similar to series objects, Math operations for DataFrames are Index Aligned.**

Aligning will take each index entry from a particular column in the left df and match it up with every entry with the same index of the same column in the right df. This is repeated for all the overlapping columns. If any of the df has duplicate index this will cause the addition operation to behave unexpectedly i.e, it will work by process of permutating the matching indexex.

In [43]:
# s1: 3 rows and 4 columns
# s2: 2 rows and 5 columns
s1 = pd.DataFrame(
    np.linspace(2, 13, 12).reshape(3, 4),
    columns=["a1", "b1", "c1", "d1"],
    index=[1, 2, 3],
)
s2 = pd.DataFrame(
    np.linspace(2, 11, 10).reshape(2, 5),
    columns=["a1", "b1", "c1", "d1", "e1"],
    index=[2, 2],
)

In [44]:
s1 + s2

Unnamed: 0,a1,b1,c1,d1,e1
1,,,,,
2,8.0,10.0,12.0,14.0,
2,13.0,15.0,17.0,19.0,
3,,,,,


As we can see, only the **overlapping rows** (2nd row) **and columns** (a1 through d1) get added together. The other values are missing. We can use the **.add method instead of "+" and define a fill value** if we wanted, similar to what we've done in case of series objects.

In [45]:
s1.add(s2, fill_value=0)

Unnamed: 0,a1,b1,c1,d1,e1
1,2.0,3.0,4.0,5.0,
2,8.0,10.0,12.0,14.0,6.0
2,13.0,15.0,17.0,19.0,11.0
3,10.0,11.0,12.0,13.0,
