# Pandas Introduction

Installation:

```python
conda install pandas

```

![pandas01.PNG](images/pandas01.PNG)

Source: https://www.geeksforgeeks.org/creating-a-pandas-dataframe/

## Creating a data frame

In [None]:
import pandas as pd

Create an empty Dataframe

In [None]:
m = pd.DataFrame(columns = ['A','B', 'C'])

In [None]:
data={'A':1,'B':2,'C':3}
m=m.append(data,ignore_index=True)
m

Create some data and define key/value pairs

In [None]:
data = {
    'weight': [60, 50, 90, 75], 
    'height': [162, 167, 180, 175]
}

In [None]:
data

Create a dataframe

In [None]:
m = pd.DataFrame(data)

Each (key, value) item in data corresponds to a column in the resulting dataframe.

In [None]:
m

The output above shows an automatic index from 0 to 3. We can name those indices

In [None]:
m = pd.DataFrame(data, index=['Laura', 'Jenny', 'Martin', 'Tom'])
m

## Working with data frames

Let's select one column

In [None]:
m["weight"]

Let's select one row (Positional Indexing)

In [None]:
m.loc['Martin']

Select only rows based on values (Boolean Indexing)

In [None]:
m[m["weight"]<70]

Select using the Query API

In [None]:
m.query('weight <70 and height>165')

In [None]:
df2 = pd.DataFrame([[55, 174], [66, 168]], columns=['weight','height'])
m.append(df2)

Let's see if it has beend added?

In [None]:
m

Let's recalculate the index

In [None]:
df2 = pd.DataFrame([[55, 174], [66, 168]], columns=['weight','height'])
m=m.append(df2,ignore_index=True)
m

Add a column

In [None]:
m.insert(2, "Age", [21, 23, 24, 21, 45, 33], True) 

In [None]:
m

In [None]:
df1 = pd.DataFrame({
   "A": ["A0", "A1", "A2", "A3"],
   "B": ["B0", "B1", "B2", "B3"],
   "C": ["C0", "C1", "C2", "C3"],
   "D": ["D0", "D1", "D2", "D3"],
   },index=[0, 1, 2, 3])
df1

In [None]:
df2 = pd.DataFrame({
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
},index=[4, 5, 6, 7])
df2

In [None]:
result = pd.concat([df1,df2],axis=0)
result

In [None]:
df3 = pd.DataFrame({
"E": ["B2", "B3", "B6" ],
"F": ["D2", "D3", "D6" ],
"G": ["F2", "F3", "F6" ],
}, index=[2, 3, 6])
df3


In [None]:
result = pd.concat([df1, df3], axis=1)
result

## Reading files in Pandas

Dataset info: https://archive.ics.uci.edu/ml/datasets/adult

Pandas documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

In [None]:
adult = pd.read_csv('adult.csv')
#adult = pd.read_csv('adult.csv', index_col=0)

In [None]:
adult

Show just a view rows

In [None]:
adult.head()

Descriptive Analysis with pandas

In [None]:
adult["age"].mean()

In [None]:
adult["age"].median()

In [None]:
adult[["age", "capital-gain"]].describe()

In [None]:
adult.groupby("education")["age"].mean()

In [None]:
adult["marital-status"].value_counts()

In [None]:
hist = adult.hist(column=["age","capital-gain"],bins=10)
hist

In [None]:
adult['marital-status'].value_counts().plot(kind='bar',figsize=(10,10))