# Pandas Tutorial Day 2

We will look at the following:
1. Creating dataframes
2. Dealing with rows and columns
3. Operations: min, max, std, describe, head, tail, etc.
4. Conditional selection
5. set_index

## Creating Dataframes
It is a data structure to store data in Pandas. It is used to represent data with rows and columns, similar to what we see in excel sheets.
We can create dataframes using .csv files or using dictionaries

In [None]:
import pandas as pd

# using csv files
df = pd.read_csv('database.csv')
print(df)

# using dictionaries
dataDict = {
    'ID':['1', '2', '3', '4', '5'],
    'Name':['Ajay', 'Ramesh', 'Kris', 'Alto', 'Tenny'],
    'Surname': ['Singh', 'Kumar', 'Grusing', 'Pat', 'Jennings'],
    'Score': [22, 37, 50, 72, 10],
    'Age':[26, 22, 31, 30, 28]
}
print(pd.DataFrame(dataDict))

# both ways produce identical dataframes

In [None]:
# getting the dimensions of the dataframe
# returns a tuple, (rows, columns), which can be stored in variables

rows, columns = df.shape
print(f"The dataframe has {rows} rows and {columns} columns.")

In [None]:
# when we have a large dataset, we can use df.head() and df.tail() to view the first and last few entries in our dataframe

print(df.head(1)) # shows the first few entries at the start

print(df.tail(2)) # shows the last few entries at the end

## Dealing with rows and columns
Similar to Python, we can use slicing in dataframes as well to the our desired output

In [None]:
# printing rows 2 to 4
df[2:5]

Now to print the columns we can use the following

In [None]:
# prints all the columns in the dataframe
print(df.columns)

# to print the content of individual columns
print(df.Age)

# or we can do the following
print(df['Age'])

The columns in our dataframe are type series

In [None]:
print(type(df.Age))

If we want to print only certain columns from out dataframe, we can use the following

In [None]:
df[['Age', 'Name']]

## Operations
We can perform a bunch of operation on our dataframe to retreive relevant statistics about our data

In [None]:
# finding the maximum and minimum
print(f"The minimum age in the dataframe is: {df['Age'].min()}")
print(f"The maximum score in the dataframe is: {df['Score'].max()}")

# finding the average score of the dataset
print(f"The average score of the dataset is: {df.Score.mean()}")

Now to quickly see a bunch of statistics in one go, we can use the `df.describe()` function

In [None]:
# to get various statistics in one go
print(df.describe())

## Conditional Selection
We can also perform selection based on some conditions

In [None]:
# information of people whose age is more than 30
print(df[df.Age >30])

# printing the age and score of people whose score was more than 20
print(df[['Age', 'Score']][df['Score']>20])

# printing the data for person with maximum age
print(df[df.Age == df.Age.max()])

## set_index
If we want to change the index in a dataframe, we can use `set_index(colName)`

In [None]:
# prints the start, stop and step in the index
print(df.index)

# changes the index to 'Age' columns
print(df.set_index('Age', inplace = True)) # we have to put inplace = True to modify the original database, else what we get is a copy of the original database

# gives us all the details of the entry with 'Age' == 26
print(df.loc[28])

# to reset the index
df.reset_index(inplace = True)

# printing the dataframe and the index
print(df)
print(df.index)