## Source: https://pandas.pydata.org/pandas-docs/stable/index.html

### What is Pandas? 
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

### What data structures does Pandas use?
Pandas has 2 primary data structures - namely Series(1-D) and DataFrame(2-D). 

### What to expect from this tutorial/notebook? 
This tutorial is intended to introduce you to Pandas from a high level and then expose you to
- Data Acquisition 
- Data Cleaning 
- Data Filtering 
- Data Aggregation 
- Data Analysis (depending on time availability)

### How is this different from the countless other materials that are publicly available? 
It is by no means exhaustive or extensive, rather you can consider it my share of learnings that I picked up and learned as I attempted to use Python. I will be sharing tips and tricks that I found to be helpful, but if you know something better, you are welcome to share it with me/us. 

#### How to create data-structures in Pandas ? 

In [7]:
# Import required libraries 
import pandas as pd
import os  

1.0.3


In [9]:
# Creating a series 
my_numeric_series = pd.Series([2, 3, 5, 7], name="Primes_Under_10")
print(my_numeric_series)
my_character_series = pd.Series(["DesertPy", "SoCal_Python", "PyLadies_of_LA"], name="Some_Python_Meetups")
print(my_character_series)
my_mixed_series = pd.Series([2, "a", 4, "b"], name="Mixed_Series")
print(my_mixed_series)

0    2
1    3
2    5
3    7
Name: Primes_Under_10, dtype: int64
0          DesertPy
1      SoCal_Python
2    PyLadies_of_LA
Name: Some_Python_Meetups, dtype: object
0    2
1    a
2    4
3    b
Name: Mixed_Series, dtype: object


In [6]:
# Creating a data frame 
# Method 1 - from list of lists 
list_of_lists = [["Doug Ducey", "Arizona", 2023], ["Gavin Newsom", "California", 2023], 
                 ["Ron Desantis", "Florida", 2023], ["Andrew Cuomo", "New York", 2022],
                 ["Brian Kemp", "Georgia", 2023]]
governors_in_the_news_df = pd.DataFrame(data=list_of_lists, columns=["Name", "State", "Term_Expiry"])
print(governors_in_the_news_df)

print("******************************")
print("Separators for easier display")
print("******************************")

# Method 2 - from dictionary of lists
dict_of_lists = {"Name": ["Jay Inslee", "Ned Lamont", "Andy Beshear", "Roy Cooper"],
                "State": ["Washington", "Connecticut", "Kentucky", "North Carolina"],
                "Term_Expiry": [2021, 2023, 2023, 2021]}
governors_df = pd.DataFrame(data=dict_of_lists)
print(governors_df)

print("******************************")
print("Separators for easier display")
print("******************************")

# Method 3 - from list of dictionaries 
list_of_dicts = [{'USA': 50, 'Brazil': 26, 'Canada':10}]
states_in_countries_df = pd.DataFrame(data=list_of_dicts, index=["State_Count"])
print(states_in_countries_df)

print("******************************")
print("Separators for easier display")
print("******************************")

# Method 4 - from lists with zip 
stock_symbols = ["AAPL", "AMZN", "V", "MA"]
prices_i_wish_i_bought_them_at = [50, 10, 1, 78]
stocks_i_wanted_df = pd.DataFrame(data=list(zip(stock_symbols, prices_i_wish_i_bought_them_at)),
                                  columns=["Stock_Symobl", "Dream_Price"]) 
print(stocks_i_wanted_df)

print("******************************")
print("Separators for easier display")
print("******************************")

# Method 5 - dict of pd.Series 
dict_of_series = {'Place_I_Wanted_To_Be' : 
                    pd.Series(["New Zealand", "Fiji", "Bahamas"], index =["January",    "February", "March"]),                  'Place_I_Am_At' : 
                    pd.Series(["Home", "Home", "Home"], index =["January", "February", "March"])} 
lockdown_mood_df = pd.DataFrame(dict_of_series)
print(lockdown_mood_df)

Name       State  Term_Expiry
0    Doug Ducey     Arizona         2023
1  Gavin Newsom  California         2023
2  Ron Desantis     Florida         2023
3  Andrew Cuomo    New York         2022
4    Brian Kemp     Georgia         2023
******************************
Separators for easier display
******************************
           Name           State  Term_Expiry
0    Jay Inslee      Washington         2021
1    Ned Lamont     Connecticut         2023
2  Andy Beshear        Kentucky         2023
3    Roy Cooper  North Carolina         2021
******************************
Separators for easier display
******************************
             USA  Brazil  Canada
State_Count   50      26      10
******************************
Separators for easier display
******************************
  Stock_Symobl  Dream_Price
0         AAPL           50
1         AMZN           10
2            V            1
3           MA           78
******************************
Separators for easier displ

## Takeaways-1
#### From the above examples, it is helpful to identify a few takeaways: 
- Series and DataFrame can represent most of the commonly used data sets. Constructing your data into a Series or a DataFrame allows you to leverage a lot of built-in functionality that Pandas offers 
- Series and DataFrame support homogeneous and heterogeneous data - meaning they can handle same data types as well as different data types 
- Series and DataFrame have an index property which defaults to an integer but can be set as desired (imagine time stamps, letters etc.)
- Pandas 1.0.0 deprecated the testing module and limited to only assertion functions. While not advisable, if you are using a version < 1.0.0, pandas.util.testing offers close to 30 different built-in functions to whip up different data frames that make it easy to test. You can get the list of possible functions like so 

```
import pandas.util.testing as tm 
dataframe_constructor_functions = [i for i in dir(tm) if i.startswith('make')]
print(dataframe_constructor_functions)
```

- While all of these are good to know, a typical use-case would not require a user to create data, rather import/acquire data from several different data sources - which leads us to our first topic of Data Acquisition 

## Data Acquisition 

#### One of the most powerful and appealing aspects of Pandas is its ability to easily acquire and ingest data from several different data sources including but not limited to: 
- CSV
- Text 
- JSON 
- HTML 
- Excel
- SQL

  An exhaustive list can be found here - https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

In [None]:
# CSV 
current_directory = os.getcwd()
raw_data_folder = current_directory + '\\' + 'covid-19_data' 