# Intro to Pandas Library

The pandas library consists mostly of two dat types:
 - Series
 - DataFrames (mostly dealing with these): a 2D object to represent `tablular` data

 ## Characteristics of DataFrames
  - Mutable
  - Index based
  - A full class

  ## Classic data representation formats
   - JSON (Javascript Object Notation)
   - CSV

In [1]:
import pandas as pd

`conda install pandas`

In [2]:
# Create a new dataFrame
df_one = pd.DataFrame(columns= ['Column1', 'Column2'])
#picture finding an excel sheet - takes in a list of strings for column names
df_one

Unnamed: 0,Column1,Column2


In [3]:
# Another way
df_one.columns = ['First', 'Second']
df_one

Unnamed: 0,First,Second


In [4]:
# Add some rows
data1 = {
    'First':[1, 2, 'Real'], # column 'First': values must be a list
    'Second':[22, 44, 'Madrid'] # key= column, values = list of data
}
# Create a new df with data
df_two = pd.DataFrame(data1)
df_two

Unnamed: 0,First,Second
0,1,22
1,2,44
2,Real,Madrid


In [None]:
# Can also combine two dfs
df_three = pd.concat([df_one, df_two])
df_three # also note that we haven't added any rows to df_one up to this point so only 3 rows show up

In [7]:
# Adding data to df_one now
data = {
    'First':[66, 77, 'Hello'],
    'Second': [88, 99, 'CS4580']
}
# Add it to frame
df_one = pd.DataFrame(data)
# Combine one and two
df_three = pd.concat([df_one, df_two])

df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
0,1,22
1,2,44
2,Real,Madrid


In [8]:
# fix the indeces -- update them as you combine the dfs
df_three.reset_index(drop=True, inplace=True)
df_three

Unnamed: 0,First,Second
0,66,88
1,77,99
2,Hello,CS4580
3,1,22
4,2,44
5,Real,Madrid


In [11]:
# Add df column-wise
col_data = {
    'Third' : [88, 99, 11]
}
df_four = pd.DataFrame(col_data)
# Now combine them by columns, add the axis = 1 param
df_combined = pd.concat([df_three, df_four], axis=1) #without the list it was trying row-wise
df_combined

Unnamed: 0,First,Second,Third
0,66,88,88.0
1,77,99,99.0
2,Hello,CS4580,11.0
3,1,22,
4,2,44,
5,Real,Madrid,


# Task: Sales Data

In [20]:
# Define a df called df_sales with two columns: Date, Amount
df_sales = pd.DataFrame(columns=['Date', 'Amount'])
#  Add sample data: 3 rows
sales_data = {
    'Date': ['10/10/2020 23:59', '5/6/2007 08:09', '10/11/2012 13:14'],
    'Amount': ['23.59', '12.22', '9.11']
}
df_sales = pd.DataFrame(sales_data)
# TODO: Create a new df with two more rows of data, same columns as df_sales
more_data = {
    'Date': ['1/3/1995 07:09','2/4/1996 08:10','3/6/1999 12:15'],
    'Amount': ['150','220','121.20']
}
df_more_sales = pd.DataFrame(more_data)
# TODO: Combine them, and make sure indexes are correct
df_sales = pd.concat([df_sales, df_more_sales], ignore_index=True)
# TODO: Add a new column called: 'Product' with 4 rows of data, and combine it with original df_sales
new_column = pd.DataFrame({'Product':['A','B','C']})
df_sales['Product'] = new_column
df_sales



Unnamed: 0,Date,Amount,Product
0,10/10/2020 23:59,23.59,A
1,5/6/2007 08:09,12.22,B
2,10/11/2012 13:14,9.11,C
3,1/3/1995 07:09,150.0,
4,2/4/1996 08:10,220.0,
5,3/6/1999 12:15,121.2,


# Working with JSON Files

You can handle JSON files directly wih Pandas using the `` method

In [4]:
import pandas as pd

# load json file
df_json_data = pd.read_json('../data/example-1.json')
#df_json_data = pd.read_json('C:\Users\K\Downloads/pandas01Data/example-1.json')

In [5]:
# convert dataFrame to json formatted string
json_format = df_json_data.to_json()
json_format

'{"Column 1":{"0":1,"1":3,"2":5},"Column 2":{"0":2,"1":4,"2":6}}'

In [None]:
# display df
ds_json_data

## Working with CSV Files

With Pandas use the `read_csv()` method

In [None]:
df_csv_format = pd.read_csv('../data/example-1.csv')
# Display it
df_csv_format

In [None]:
# read file without header record
df_csv_format = pd.read_csv('../data/example-2.csv')
df_csv_format

In [None]:
# save the data back to csv file
# if you do not need the index information, use index=None
df_csv_format.to_csv('test.csv')