# Intro to Pandas Library

The pandas library consist mostly of two data types:
- Series
- DataFrames: a 2D object to represent `tabular` data

## DataFrames Characteristics
- Mutable
- Index based
- A full class

### Classic data representation formats

- JSON : javascript object notation (if you know dictionaries really well, all done with dictionaries). Best for weird data or different data types
- CSV : like in an excel spreadsheet. Really fast to start working through. Great when data is already structured for you.

Pandas works with both!

In [1]:
import pandas as pd

In [2]:
# Create a new dataFrame
df_one = pd.DataFrame(columns=['Column1', 'Column2'])
df_one

Unnamed: 0,Column1,Column2


In [3]:
# Another way
df_one.columns = ['First', 'Second']
df_one

Unnamed: 0,First,Second


In [4]:
# Add some data (rows)
data1 = {
    'First':[1, 2, 'Real'],  # key = column, values = list
    'Second':[22, 44, 'Madrid']
}
# Create new df with data
df_two = pd.DataFrame(data1)
df_two

Unnamed: 0,First,Second
0,1,22
1,2,44
2,Real,Madrid


In [5]:
# Combine two dfs
df_three = pd.concat([df_one,df_two])
df_three

Unnamed: 0,First,Second
0,1,22
1,2,44
2,Real,Madrid


In [6]:
# Add data to df_one
data = {
    'First': [66, 77, 'Hello'],
    'Second': [88,99,'CS4580']
}
# Add it to frame
df_one = pd.DataFrame(data)
# Combine
df_three = pd.concat([df_one,df_two])
df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
0,1,22
1,2,44
2,Real,Madrid


In [7]:
# Update indexes as you combine them
df_three.reset_index(drop=True, inplace=True)
df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
3,1,22
4,2,44
5,Real,Madrid


In [9]:
# Add df column-wise
col_data = {
    'Third' : [88,99,11]
}
df_four = pd.DataFrame(col_data)
# Now combine them by columns, add the axis = 1 parameter 
df_combined = pd.concat([df_three, df_four], axis=1)
df_combined

Unnamed: 0,First,Second,Third
0,66,88,88.0
1,77,99,99.0
2,Hello,CS4580,11.0
3,1,22,
4,2,44,
5,Real,Madrid,


## Task: Sales Data

In [18]:
#  Define a df called df_sales with two columns: Date, Amount


# Add sample data: 3 rows
import pandas as pd
sales_data = {
    'Date' : ['9/9/24','12/1/23','8/24/24'],
    'Amount' : [245, 376, 856]
}

df_sales = pd.DataFrame(sales_data)

# Create a new df with two more rows of data, same columns as df_sales

sales_data2 = {
    'Date' : ['12/9/21','8/7/23','4/24/24'],
    'Amount' : [104, 876, 234]
}

df_sales2 = pd.DataFrame(sales_data2)

# Combine them, and make sure indexes are correct 

df_sales = pd.concat([df_sales, df_sales2], ignore_index=True)

# Add a new column called: 'Product' with 4 rows of data, and combine it to original df_sales
new_column = pd.DataFrame({'Product': ['A', 'B', 'C', 'D']})
df_sales['Product'] = new_column
df_sales


Unnamed: 0,Date,Amount,Product
0,9/9/24,245,A
1,12/1/23,376,B
2,8/24/24,856,C
3,12/9/21,104,D
4,8/7/23,876,
5,4/24/24,234,
