# DataFrames and Series

## What are data frames?
A CSV is basically an Excel table. Just rows and columns of data. A data frame is basically the same thing as it's a data-structure that contains those rows and columns, alongside a bunch of functionality that allows us to manipulate that data. Sometimes it may be easier to think about it is like a dictionary:
```
users = {
  'username': ["Knguyen44", "AbagailW3nd"],
  'email': ["knguyen44@gmail.com", "abbyWendel@outlook.com"]
}
```
Yeah it's kind of like this. Honestly if you're familiar with SQL databases, Excel, etc. then this should be simple to see. Our data frames are just these tables but with extra cool features.

## What are series?
This is just an 'array' of data, or just column of data, so you could have a series for the 'username'. This allows you to easily access all values in the username column, and also it adds cool new functionality and features to manipulate the values of said column. Of course you can get more complex with it, but this is the base idea. You could think of a data-frame as a collection of series that are teaming up.

In [8]:
'''
+ Ex.1 Creating a dataframe from a dictionary. Another option besides reading from a csv and creating the data frame in that manner.
'''
import pandas as pd
users = {
  'username': ["Knguyen44", "AbagailW3nd"],
  'email': ["knguyen44@gmail.com", "abbyWendel@outlook.com"]
}
usersDF = pd.DataFrame(users)

'''
+ Ex.2 Selecting specific columns back. 

So in our first example, we get back a series for the username. Basically we're just getting back all of the values for the 'username' column. However in our second example 
we wanted to get back the username and email column. Since we're getting more than one column, we'd get back a data frame instead of a series.

NOTE: We're getting back 
'''
usernameSeries = usersDF["username"] # or usersDF.username; Though be careful as sometimes a column's name may have the same name as a data frame method e.g. 'count'
usernameAndEmailDataFrame = usersDF[["username", "email"]]

'''
+ Ex.3 Selecting rows with 'integer location'. 

In our first example, we are returned a series with the row's information. Things such as 'username' and 'email'. The 'indices' here are the column names so 
'username' and 'email' are index values '0' and '1' respectively. This is just simple array indexing. Anyways I want only the email, so I'm going to choose 
index '1'. 

Again in the first problem we return a series since it's just one array of values, and then in the second problem we return a data frame since we're dealing with two 'arrays' or rows of values.
'''
firstUser = usersDF.iloc[0] 
emailRows = usersDF.iloc[[0, 1], [1]]


'''
+ Ex.4 Selecting rows using 'labels'. 

Basically very similar to iloc, but it seems you can see the string names when specifying the columns or 'indices'. Remember indices aren't just columns and 
will get a little more complex later on.
'''
emailAndUsernameDF = usersDF.loc[[0,1], [["email", "username"]]]



"\n+ Ex.4 Selecting rows using 'labels'.\n"

## Back to the StackOverflow dataset
Let's do some things:

1. Get all of the values for the 'hobbyist' column. 


In [6]:
import pandas as pd
csvPath = "../data/survey_results_public.csv"
df = pd.read_csv(csvPath)

'''
+ Task 1: Get all values from the hobbyist column. So remember df.columns allows us to see all of the columns that are available.

'''
hobbyistData = df["Hobbyist"]

# Get the first 3 rows, from 0 to 2 inclusive, and only get the hobbyist column
df.loc[0:2, "Hobbyist"]


'''
 + Task 2: Do more slicing. So we get rows with indexes 0, 1, and 2. Then from those rows we get all columns from Hobbyist to Employement inclusive. So yeah the 
 reason they made it inclusive because it'd be pretty annoying and hard to use if it wasn't inclusive. It'd be kind of weird to specify the column after the 'Employment' column to 
 indicate that we want to include the Employment column.

'''
df.loc[0:2, "Hobbyist":"Employment"]



Unnamed: 0,Hobbyist,OpenSourcer,OpenSource,Employment
0,Yes,Never,The quality of OSS and closed source software ...,"Not employed, and not looking for work"
1,No,Less than once per year,The quality of OSS and closed source software ...,"Not employed, but looking for work"
2,Yes,Never,The quality of OSS and closed source software ...,Employed full-time
