# Intro Pandas Library

The pandas library consist mostly of two data types:
- Series
- DataFrames: A 2D object tp represent `tabular` data

## Dataframe Characteristics
- Mutable
- Index based
- A full class

## Classic data representation formats
- JSON
- CSV

In [2]:
import pandas as pd

In [3]:
# Create a new DataFrame
df_one = pd.DataFrame(columns=['column1', 'column2'])
df_one

Unnamed: 0,column1,column2


In [5]:
# Another way
df_one.columns = ['First', 'Second']
df_one

Unnamed: 0,First,Second


In [6]:
# Add some rows
data1 = {
    'First': [1, 2, 'Real'],    # Key = column, values = List of data
    'Second': [22, 44, 'Madrid']
}
# Create new df with data
df_two = pd.DataFrame(data1)
df_two

Unnamed: 0,First,Second
0,1,22
1,2,44
2,Real,Madrid


In [7]:
# Combine two dfs
df_three = pd.concat([df_one, df_two])
df_three

Unnamed: 0,First,Second
0,1,22
1,2,44
2,Real,Madrid


In [8]:
# Add data to df_one
data = {
    'First': [66, 77, 'Hello'],
    'Second': [88, 99, 'CS4580']
}
df_one = pd.DataFrame(data)
# Combine two dfs
df_three = pd.concat([df_one, df_two])
df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
0,1,22
1,2,44
2,Real,Madrid


In [9]:
# Update indexes as you combine them
df_three.reset_index(drop=True, inplace=True)
df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
3,1,22
4,2,44
5,Real,Madrid


In [11]:
# Add df column-wise
col_data = {
    'Third': [88, 99, 11]
}
df_four = pd.DataFrame(col_data)
# Now combine them by columns, ad the axis=1 parameter
df_combined = pd.concat([df_three, df_four], axis=1)
df_combined

Unnamed: 0,First,Second,Third
0,66,88,88.0
1,77,99,99.0
2,Hello,CS4580,11.0
3,1,22,
4,2,44,
5,Real,Madrid,


## Task: Sales Data

In [12]:
# TODO: Define a df called df_sales with two columns: Date, Amount
df_sales = pd.DataFrame(columns=["Date", "Amount"])
# TODO: Add sample data: 3 rows
sample_data = {
    "Date": ["01.01.2024", "05.01.2024", "08.01.2024"],
    "Amount": [33, 44, 55]
}
df_sales = pd.DataFrame(sample_data)
# TODO: Create a  new df with two more rows of data, same columns as df_sales
sample_data2 = {
    "Date": ["10.01.2024", "15.01.2024"],
    "Amount": [77, 88]
}
df_sales_two = pd.DataFrame(sample_data2)
# TODO: Combine them, and make sure indexes are correct
df_sales = pd.concat([df_sales, df_sales_two])
df_sales.reset_index(drop=True, inplace=True)
# TODO: Add a new column called: 'Product' with 4 rows of data, and combine it to original df_sales
sample_data3 = {
    "Product": ["A", "B", "C", "D"]
}
df_sales_three = pd.DataFrame(sample_data3)
df_sales = pd.concat([df_sales, df_sales_three], axis=1)
df_sales

Unnamed: 0,Date,Amount,Product
0,01.01.2024,33,A
1,05.01.2024,44,B
2,08.01.2024,55,C
3,10.01.2024,77,D
4,15.01.2024,88,


## Working with JSON Files
You can handle JSON files directly with Pandas using the `read_json()` method.

In [3]:
import pandas as pd

# Load json file
df_json_data = pd.read_json('../data/example-1.json')

# Display df
df_json_data

Unnamed: 0,Column 1,Column 2
0,1,2
1,3,4
2,5,6


In [4]:
# Convert dataframe to json formatted string
json_format = df_json_data.to_json()
json_format

'{"Column 1":{"0":1,"1":3,"2":5},"Column 2":{"0":2,"1":4,"2":6}}'

## Working with CSV Files
With Pandas use the `read_csv()` method

In [5]:
df_csv_format = pd.read_csv('../data/example-1.csv')

# Display it
df_csv_format

Unnamed: 0,Branch,Date,Amount
0,Branch A,January 1,500.0
1,Branch B,January 2,250.0
2,Branch A,January 3,300.0


In [7]:
# Read file without the header record
df_csv_format = pd.read_csv('../data/example-2.csv', header=None)

# Display it
df_csv_format

Unnamed: 0,0,1,2
0,Branch A,January 1,500.0
1,Branch B,January 2,250.0
2,Branch A,January 3,300.0


In [10]:
# Save the data back to a csv file
# If you do not need the index information, use index=None
df_csv_format.to_csv('test.csv', index=None)