### <mark> Pandas Dataframe

pd.DataFrame(
    
    data=None,
    index: 'Optional[Axes]' = None,
    columns: 'Optional[Axes]' = None,
    dtype: 'Optional[Dtype]' = None,
    copy: 'bool' = False,
)

**Two-dimensional, size-mutable, potentially heterogeneous tabular data.**

Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as **A DICT LIKE CONTAINER FOR SERIES OBJECTS.** The primary
pandas data structure.

data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
    Dict can contain Series, arrays, constants, dataclass or list-like objects. If
    data is a dict, column order follows insertion-order.

#### Creating Pandas Dataframe

    from dictionary
    from list of tuples
    from arrays
    from series
    from csv, tsv, excel, etc

In [2]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', 100)
pd.set_option('display.max_colwidth', 104)
pd.set_option('display.max_rows', 106)

In [3]:
data = pd.read_csv('diabetes.csv')
df = pd.DataFrame(data)

In [4]:
data={
    "first":["Sam","Karen","Paul","Chris",np.nan,None,'NA'],
    "last":["Peri","Shuffer","Baget","Pottin",np.nan,np.nan,'Missing'],
    "email":["SP@gmail","KS@yahoo","PB@facebook","CP@gmail",None,np.nan,"anonymous@gmail"],
    "age":['33','33','63','36',None,None,'Missing']
}

df_people=pd.DataFrame(data)

In [5]:
data={
    "first":["Tony", "Steve"],
    "last":["Stark", "Rogers"],
    "email":["iron@gmail", "captain@gamil"]
}

df_avengers=pd.DataFrame(data)

#### Accesssing elements from dataframe

In [7]:
df.columns

Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'],
      dtype='object')

In [8]:
df.index

RangeIndex(start=0, stop=768, step=1)

In [34]:
new_df = df.sort_index(ascending=False)

In [9]:
df.dtypes

Pregnancies                   int64
Glucose                       int64
BloodPressure                 int64
SkinThickness                 int64
Insulin                       int64
BMI                         float64
DiabetesPedigreeFunction    float64
Age                           int64
Outcome                       int64
dtype: object

### <mark> Slicing of dataframe

In [13]:
df[5:7]

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
5,5,116,74,0,0,25.6,0.201,30,0
6,3,78,50,32,88,31.0,0.248,26,1


In [12]:
df[['Age', 'BMI']][0:10:3]

Unnamed: 0,Age,BMI
0,50,33.6
3,21,28.1
6,26,31.0
9,54,0.0


In [17]:
# using iter function for every row display
for index, row in df[17:19].iterrows():
    print(index, row[['BMI', 'Age']])

17 BMI    29.6
Age    31.0
Name: 17, dtype: float64
18 BMI    43.3
Age    33.0
Name: 18, dtype: float64


### <mark> Changing the dataframe structure

#### Adding new column

In [45]:
df['full_name'] = 'robert ferguson'
df['heaviness'] = df['Glucose'] + df['SkinThickness']

In [32]:
df['SugarLevel'] = None
df['SugarLevel'] = 0

df.loc[df['Glucose']>100, "SugarLevel"] = 'High'
df.loc[df['Glucose']<100, "SugarLevel"] = 'Low'

In [33]:
df.sample()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome,heaviness,body_fats,SugarLevel
534,1,77,56,30,56,33.3,1.251,24,0,107,0,Low


#### Renaming columns

    df.columns = ['id', 'name', 'gender', ...]
    df.columns = [x.lower() for x in df.columns]
    df.columns = df.columns.str.replace(" ", "_")

    df.rename(columns={'DiabetesPedigreeFunction':'Pedigree', 'Insulin':'InsulinLevel'})

In [36]:
df.rename(columns={'DiabetesPedigreeFunction':'Pedigree', 'Insulin':'InsulinLevel'},
         inplace=True)

In [44]:
# removing column

df.drop(columns=['SugarLevel'],inplace=True)

In [46]:
df[['first', 'last']] = df['full_name'].str.split(' ', expand=True)

In [69]:
# change datatype
df['Age'] = df['Age'].astype(float)

#### Changing cell values

In [40]:
df.loc[34] = 20
df.loc[34:36, ['InsulinLevel', 'Glucose']] = 44

#### adding/removing rows

In [50]:
df.append({'Glucose': 189, 'BloodPressure': 136}, ignore_index=True)
df.tail(1)

  df.append({'Glucose': 189, 'BloodPressure': 136}, ignore_index=True)


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,InsulinLevel,BMI,Pedigree,Age,Outcome,heaviness,body_fats,full_name,first,last
684,5,136,82,0,0,0.0,0.64,69,0,136,0,robert ferguson,robert,ferguson


In [59]:
df.drop(index=5, inplace=True, errors='ignore')

# drop rows where a filter condition meets
df.drop(index=df[df['BloodPressure']>120].index, inplace=True, errors='ignore')

### <mark> Exporting

    df_titanic.to_csv("titanic.csv", sep="\t")
    df_titanic.to_excel("titanic.xlsx", index=false)

### <mark> Simple mathematical functions

In [25]:
df.nlargest(2, "BMI")
# Return the first `n` rows ordered by `columns` in descending order.

# df.nsmallest(2, "BMI")
# Return the first `n` rows ordered by `columns` in ascending order. 

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome,heaviness
177,0,129,110,46,130,67.1,0.319,26,1,175
445,0,180,78,63,14,59.4,2.42,25,1,243
