# Introduction to Data Frames

The Pandas library offers a powerful data structure known as Data Frames. These are essentially two-dimensional tables with columns and rows. In this notebook, we will go over how to create a Data Frame from data of a **CSV file**. This stands for Comma-Seperated Values, and stores data seperated by commas.

We will also learn how to make a Data Frame from scratch, and convert it to a CSV file. We will then learn how to extract a Data Frame from an Excel file, and vice versa.

---

Let's look at the cell below. First, we `import pandas as pd`. This means that we take all the methods, objects, and good stuff inside the Pandas library, and store it inside `pd`. This is an object that we will use throughout this notebook to invoke functions from Pandas.

Alright, after that, we set a variable named `location` to the directory address of the CSV file we are looking for. This is simply a string.

After that, we create a variable called `df`, which will store our Data Frame! We use our `pd` object to invoke the `read_csv()` method inside Pandas, and pass our `location` as the argument. The result:

In [10]:
import pandas as pd
location = "../DataSets/Simple/top_ten_cities_by_pop.csv"
df = pd.read_csv(location)
df

Unnamed: 0,City,Population,House Price
0,NYC,8500000,849000
1,Los Angeles,4000000,775000
2,Chicago,2700000,245000
3,Houston,2300000,282000
4,Philadelphia,1600000,165000
5,Phoneix,1500000,285000
6,San Antonio,1500000,235000
7,San Diego,1400000,649000
8,Dallas,1300000,385000
9,San Jose,1000000,750000


Now, as you can see, we have read the complete CSV file. If you want to take a small peek at the data, the head() method of DataFrame will show the first 5 rows by default.

In [8]:
df.head()

Unnamed: 0,City,Population,House Price
0,NYC,8500000,849000
1,Los Angeles,4000000,775000
2,Chicago,2700000,245000
3,Houston,2300000,282000
4,Philadelphia,1600000,165000


If you want to peek at a specific number of rows, put that number inside the parentheses of `head()`. This also works with the `tail()` command, which takes the last few rows.

In [9]:
df.head(2)

Unnamed: 0,City,Population,House Price
0,NYC,8500000,849000
1,Los Angeles,4000000,775000


You can add columns to the Data Frame, by creating a list with the elements. Create a new column with a header named "state", and set it equal to that list. 

In [3]:
states = ["NY", "CA", "IL", "TX", "PA", "AZ", "TX", "CA", 'TX', "CA"]
df["State"] = states
df.head()

Unnamed: 0,City,Population,House Price,State
0,NYC,8500000,849000,NY
1,Los Angeles,4000000,775000,CA
2,Chicago,2700000,245000,IL
3,Houston,2300000,282000,TX
4,Philadelphia,1600000,165000,PA


Of course, you can make your own Data Frame, without needing to scan it from a CSV. Make a bunch of lists!

In [4]:
dafr = pd.DataFrame()
dafr["Name"] = ["Bob", "Alice", "Mark", "Sally"]
dafr["Grade"] = ['B', 'A', 'A', 'A']
dafr

Unnamed: 0,Name,Grade
0,Bob,B
1,Alice,A
2,Mark,A
3,Sally,A


Next, we will load an Excel file using the `read_excel()` command.

In [6]:
loc = "../DataSets/Simple/top-ten-happy-countries-forbes.xlsx"
datfra = pd.read_excel(loc)
datfra

Unnamed: 0,Ranking,Country,Happy Score
0,1,Finland,7.632
1,2,Norway,7.594
2,3,Denmark,7.555
3,4,Iceland,7.495
4,5,Switzerland,7.487
5,6,Netherlands,7.441
6,7,Canada,7.382
7,8,New Zealand,7.324
8,9,Sweden,7.314
9,10,Australia,7.272


In the next notebook, we will go over some data manipulations with data frames!