# Intro to Pandas 

Tonight we're going to be starting our journey through the Pandas library (created by Wes McKinney). Most of the time, when we refer to working in Pandas, we're talking about working within a dataframe, which is what we'll focus on tonight. We'll be diving into Pandas DataFrames, which are objects that will hold our data, allowing us to interact with it, manipulate it, and eventually throw it into machine learning algorithms (if we want). 

Wait, did you say that the Pandas DataFrame is an **object**? I did! What this means is that we're going to interact with Pandas DataFrames in much the same way that we interact with all of our other objects in python. Before we get to actually interacting with it, though, we'll have to get one, and get one with data in it! And there's one quick step that we have to do before that... 

### Pandas Import 

```python
import pandas as pd # Standard import. 
```

Here I've shown how we get access to everything in the Pandas library - we import it! Also note that I placed the comment python `"# Standard import"` out to the right of our import. This was to note that this is, like you might have guessed, the standard way to import the Pandas library. You should be sure that if you are importing the entire pandas library, you follow this syntax. It's common practice, and we tend to follow common practice whenever possible. 

### Getting a DataFrame Object

There are two basic ways that we can get a Pandas DataFrame object to work with. The first is by using data that is already in our Python program in conjuction with the `DataFrame` constructor, and the second is by reading in external data through the pandas module (which remembered we've imported and made accessible via `pd`). 

##### Using data already in our Python program

If we are using data that is already in our Python program, then we are going to be passing that data to the `DataFrame` constructor. We typically do this in one of two ways. The first involves passing in a list of dictionaries...

In [4]:
import pandas as pd # I haven't actually done this in code yet. 
data_lst = [{'a': 1, 'b': 2, 'c':3}, {'a': 4, 'b':5, 'c':6, 'd':7}]
df = pd.DataFrame(data_lst)
df

Unnamed: 0,a,b,c,d
0,1,2,3,
1,4,5,6,7.0


Neat, but what's going on here? How do I read that DataFrame output right above, and how did that list of dictionaries translate to that DataFrame? 

Each and every one of our Pandas DataFrames will consist of **rows** and **columns**, where the columns will be denoted and accessed via their names, and the rows will be denoted and access via the indices of the DataFrame. So above, we can look at our columns and see that our column names are `a`, `b`, `c`, and `d`. We can similiary look at our rows and see that they are indexed by `0` and `1`. These column names and indices are how we will access this data later on. So how did the `DataFrame` constructor take our list of dictionaries and put it into the DataFrame in that format?

When the Pandas DataFrame encounters a list of dictionaries like we gave it, it inerprets each dictionary to be a row in the DataFrame, with the keys in a given dictionary being the columns for that row and the values being the values for each column. By default, the DataFrame constructor will assign a column for **every** key that it sees in **any** dictionary in the list of dictionaries. If a particular dictionary in that list doesn't have a value for that key, then it assigns a `NaN` value for that dictionary for that column. Therefore, when the Pandas DataFrame above got the list of dictionaries, it saw `a`, `b`, `c`, and `d` keys, and thus created those columns. It then filled in the values associated with those keys in each of our dictionaries in our list, filling in a `NaN` if it didn't find that key (like it didn't find `d` in the second dictionary in our list. 

In [5]:
data_lst = [{'a': 1, 'b': 2, 'c':3}, {'a': 4, 'b':5, 'c':6, 'd':7}]
df = pd.DataFrame(data_lst)