# Intro to Pandas Library

The pandas library consist mostly of two data types: 
- Series
- DataFrames: A 2D object to represent `tabular` data

## DataFrames Characteristics
- Mutable
- Index based
- A full class

## Classic data representation formats

- JSON
- CSV



In [2]:
import pandas as pd

In [4]:
# Create a new dataFrame
df_one = pd.DataFrame(columns=['Column1', 'Column2'])
df_one

Unnamed: 0,Column1,Column2


In [6]:
# Another way
df_one.columns = ['First', 'Second']
df_one

Unnamed: 0,First,Second


In [7]:
# Add some rows
data1 = {
    'First':[1, 2, 'Real'],     # key= column, values = list of data
    'Second':[22, 44, 'Madrid']
}
# Create new df with data
df_two = pd.DataFrame(data1)
df_two

Unnamed: 0,First,Second
0,1,22
1,2,44
2,Real,Madrid


In [8]:
# Combine two dfs
df_three = pd.concat([df_one, df_two])
df_three

Unnamed: 0,First,Second
0,1,22
1,2,44
2,Real,Madrid


In [9]:
# Add data to df_one
data = {
    'First':[66, 77, 'Hello'], 
    'Second':[88, 99, 'CS4580']
}
# Add it to frame
df_one = pd.DataFrame(data)
# Combine them
df_three = pd.concat([df_one, df_two])
df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
0,1,22
1,2,44
2,Real,Madrid


In [10]:
# Update indexes as you combine them
df_three.reset_index(drop=True, inplace=True)
df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
3,1,22
4,2,44
5,Real,Madrid


In [13]:
# Add df colum-wise
col_data = {
    'Third': [88, 99, 11]
}
df_four = pd.DataFrame(col_data)
# Now combine them by columns, add the axis=1 parameter
df_combined = pd.concat([df_three, df_four], axis=1)
df_combined

Unnamed: 0,First,Second,Third
0,66,88,88.0
1,77,99,99.0
2,Hello,CS4580,11.0
3,1,22,
4,2,44,
5,Real,Madrid,


## Task: Sales Data


In [20]:
# TODO: Define a df called df_sales with two columns: Date, Amount
import pandas as pd
df_sales = pd.DataFrame(columns=['Date', 'Amount'])
df_sales

Unnamed: 0,Date,Amount


In [21]:

# TODO: Add sample data: 3 rows
sales_data = {
    'Date':['2024-01-23', '2024-01-12', '2024-02-15'],
    'Amount':[150, 250, 125]
}
df_sales_new = pd.DataFrame(sales_data)
df_sales_new

Unnamed: 0,Date,Amount
0,2024-01-23,150
1,2024-01-12,250
2,2024-02-15,125


In [22]:

# TODO: Combine them, and make sure indexes are correct
df_sales = pd.concat([df_sales, df_sales_new], ignore_index=True)
print(df_sales)

         Date Amount
0  2024-01-23    150
1  2024-01-12    250
2  2024-02-15    125


In [23]:

# TODO: Create a new df with two more rows of data, same columns as df_sales
df_sales_two = pd.DataFrame({
    'Date':['2024-08-12', '2024-08-24'],
    'Amount':[450, 234]
    })
df_sales_two

Unnamed: 0,Date,Amount
0,2024-08-12,450
1,2024-08-24,234


In [24]:
# Combine them
df_sales = pd.concat([df_sales, df_sales_two])
df_sales

Unnamed: 0,Date,Amount
0,2024-01-23,150
1,2024-01-12,250
2,2024-02-15,125
0,2024-08-12,450
1,2024-08-24,234


In [25]:
# Reset the index
df_sales.reset_index(drop=True, inplace=True)
df_sales

Unnamed: 0,Date,Amount
0,2024-01-23,150
1,2024-01-12,250
2,2024-02-15,125
3,2024-08-12,450
4,2024-08-24,234


In [26]:
# TODO: Add a new column called: 'Product' with 4 rows of data, and combine it to original df_sales
new_column = pd.DataFrame({'Product':['A', 'B', 'C']})
df_sales['Product'] = new_column
df_sales


Unnamed: 0,Date,Amount,Product
0,2024-01-23,150,A
1,2024-01-12,250,B
2,2024-02-15,125,C
3,2024-08-12,450,
4,2024-08-24,234,


## Working with JSON Files
You can handle JSON files directly with Pandas, using the `read_json()` method. 

In [1]:
import pandas as pd

# Load json file
df_json_data = pd.read_json('../data/example-1.json')

In [2]:
# Display df
df_json_data

Unnamed: 0,Column 1,Column 2
0,1,2
1,3,4
2,5,6


In [3]:
# Convert dataFrame to json formatted string
json_format = df_json_data.to_json()
json_format

'{"Column 1":{"0":1,"1":3,"2":5},"Column 2":{"0":2,"1":4,"2":6}}'

## Working with CSV Files

With Pandas use the `read_csv()` method.

In [4]:
df_csv_format = pd.read_csv('../data/example-1.csv')
# Display it
df_csv_format

Unnamed: 0,Branch,Date,Amount
0,Branch A,January 1,500.0
1,Branch B,January 2,250.0
2,Branch A,January 3,300.0


In [6]:
# Read file without header record
df_csv_format = pd.read_csv('../data/example-2.csv', header=None)
df_csv_format

Unnamed: 0,0,1,2
0,Branch A,January 1,500.0
1,Branch B,January 2,250.0
2,Branch A,January 3,300.0


In [9]:
# Save the data back to csv file
# If you do not need the index information, use, index=None
df_csv_format.to_csv('test.csv', index=None)