# Pandas: Not the plural form of panda (even though it is)

Pandas is one of the "Big Three" python packages for data analysis and scripting: Numpy, Pandas, and Matplotlib. These are open-source libraries that are typically imported in any and all data analysis scripts as follows:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Backstory, Backstory, Backstory
The reason why this is always the case, and the reason why python packages in general are so popular, is a wee bit of a long story:

As you are aware (or at least have been told), your computer only understands raw binary, using certain pre-defined binary instructions. In order to transform source code (which is in all actuality just a specific type of text document), into the appropriate binary machine code, you have to either run an interpretor to run your program (like python does), or run a compiler to create a binary program file from your code (which is what C++ and friends do). 

Python uses an interpretor that is primarily written in C and C++. Because of this, people can write C and C++ code and compile that to be imported and used in python files. This is helpful because for many applications, C and C++ are as much as 40 times faster than python. 

Because of this, certain things that people do a lot, which take a lot of time are often outsourced to imported C and C++ code. Three such things are: complicated mathimatical data (numpy), handling and manipulating tabular data and files (pandas), and creating images and graphs (matplotlib).

## Pandas (again, but for real this time)

Most times, when working with data, the easiest way to store the data is in a table. For example, if we were working with a database for animals in a zoo, we might have a table like the following:

|name|id|species|feeding schedule|weight (lbs)|height (in)|age (years)|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|Echo|001|Dog|2|40|24|2.5|
|Evie|002|?|1|14|10|8|
|Simon|003|Horse|3|930|60|5|
|Lucifer|004|Cat|1|12|12|13|

But representing this natively in python code would be confusing. We could either keep track if it as a list of lists

In [None]:
animals = [
    ["Echo",1,"Dog",2,40,24,2.5],
    ["Evie",2,"?",1,14,10,8],
    ["Simon",3,"Horse",3,930,60,5],
    ["Lucifer",4,"Cat",1,12,12,13]
]

But then accessing something would be weird since we only have number indexes. If we wanted to get Evie's feeding schedule, we would have to write the following garbage:

In [None]:
row_number = 0

# loop through until we find the right row
while animals[row_number][0] != "Evie":
    row_number += 1

# and here we would have to know that column 3 corresponds to the feeding schedule
print(animals[row_number][3])
 

So then we could use a list of dictionaries. But then, we would have a lot of wasted memory because dictionaries are actually a little inefficient. It is also still pretty confusing and repetitive to initialize.

In [None]:
animals = [
    {"Name": "Echo","ID": 1,"Species":"Dog","Feeding Schedule": 2,"Weight": 40,"Height": 24,"Age": 2.5},
    # etc
]

The best option would be to have a list of classes, but this only works if know exactly what datatypes we are expecting in advance. This doesn't work if we want to just crack open a random table or spreadsheet and go through the data.

In [None]:
class Animal:
    def __init__(self,name,id,species,sched,weight,height,age) -> None:
        self.name = name
        self.id = id
        # etc.

db = [
    Animal("Echo",1,"Dog",2,40,24,2.5),
    # etc.
]

## This is why we _*do have*_ nice things

Pandas handles all of these complicated nasty things behind the scenes. It makes it so we can (\**coughs*\*) easily create tables, access data within them, and manipulate them. This does mean, however, we do have to learn the ways that Pandas wants us to ask for things, and sometimes it is a little bit particular about things. But now that we have set the scene way too much, lets talk about using Pandas.

## Creating Pandas Dataframes

There are two *relatively* straightforward ways to create a pandas dataframe in code.

In [None]:
# Number One: lists and setting column names

# We first define a list of lists, where each list is one row in our table
animals = [
    ["Echo",1,"Dog",2,40,24,2.5],
    ["Evie",2,"?",1,14,10,8],
    ["Simon",3,"Horse",3,930,60,5],
    ["Lucifer",4,"Cat",1,12,12,13]
]

# We then initialize our dataframe by calling pandas's DataFrame initializer function (it's a class)
# and passing in our list of lists. We then give the initializer a list of 
df = pd.DataFrame(db, columns=["Name","ID", "Species","Feeding Schedule","Weight","Height","Age"])

In [None]:
print(df)

In [None]:
# Number Two: Dictionaries

# Or you can give the initializer a dictionary, where the keys are
# the column names, and the values are the row entries for each column
# this unfortunately means we have to be careful about keeping the 
# list entries in order
animals = {
    "Name": ["Echo","Evie","Simon","Lucifer"],
    "ID": [1,2,3,4], 
    "Species": ["Dog","?","Horse","Cat"],
    "Age" : [2.5,8,5,13]
}

In [None]:
print(df)

In reality though, these are pretty much never used. Almost always, we just want to create a pandas dataframe from a .csv file of the data. For this example, we'll use the file `Heart_Disease.csv`. Clicking on the file in VSCode, you can see the actual raw form of a naked csv file. CSV stands for "Comma Separated Values". It's usually represented as a table (like in software such as excel), but in actuality its just a bunch of lines of text, where each entry is separated by a comma.

In [None]:
df = pd.read_csv("Heart_Disease.csv")

## Accessing Data

## Adding Data to a Dataframe

## Removing Data from a Dataframe

## Looping through a Dataframe

## Saving a Dataframe as a File