# Manipulating data with Pandas! 🐼

([You can download the dataset here](https://www.kaggle.com/c/titanic/data).
[Here's a tutorial article](https://towardsdatascience.com/getting-started-to-data-analysis-with-python-pandas-with-titanic-dataset-a195ab043c77) where a lot of this came from.)


Pandas is a popular Python library. A **library** is a collection of functions that someone else wrote, packaged up, and lets other people import and use.

The primary two components of pandas are the `Series` and `DataFrame`.

A `Series` is essentially a column, and a `DataFrame` is a multi-dimensional table made up of a collection of `Series`.


![alt text](https://storage.googleapis.com/lds-media/images/series-and-dataframe.width-1200.png)

In [0]:
# We need this line to access the pandas functions
import pandas as pd

titanic = pd.read_csv("https://gist.githubusercontent.com/mmcghee18/b5ed9190e773d2e75b4bc3363f012866/raw/430bebedc223e4371770aa4f9b45eea8ae9474dd/titanic.csv")

# Getting info about your data

Try these out:
* `titanic.head()`
* `titanic.tail()`
* `titanic.shape`
* `titanic.info()`
* `titanic.describe()`

In [23]:
# your code goes here
print(titanic.head())

   PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
1            2         1       1  ...  71.2833   C85         C
2            3         1       3  ...   7.9250   NaN         S
3            4         1       1  ...  53.1000  C123         S
4            5         0       3  ...   8.0500   NaN         S

[5 rows x 12 columns]


## Accessing data

Get a column: `dataframe['column_name'])`

Get multiple columns: `dataframe[[name1', 'name2']]`

Get a row: 
* `dataframe.loc[1]` - **loc**ates by row name
* `dataframe.iloc[1]` - **loc**ates by numerican **i**ndex

Get a range of rows: 
- `dataframe[0:5]` (doesn't include the end index)
- `dataframe.loc[0:5]` - by row name
- `dataframe.iloc[0:5]` - by numerical index
- Fun fact: `loc` and `iloc` DO include the end index in their range

In [28]:
import pandas as pd

titanic = pd.read_csv("https://gist.githubusercontent.com/mmcghee18/b5ed9190e773d2e75b4bc3363f012866/raw/430bebedc223e4371770aa4f9b45eea8ae9474dd/titanic.csv")

# Print a column
# Print multiple columns
# Print the 3rd row (index 4!)
# Print the row for the passenger named 'Skoog, Miss. Margit Elizabeth'

     PassengerId  Survived  Pclass  ...  Fare Cabin  Embarked
642          643         0       3  ...  27.9   NaN         S

[1 rows x 12 columns]


### Conditional selections:

How would we filter our data frame to show only people in first class or only people that paid below a certain fare?

We can take a column and apply a Boolean condition to it, something that's true or false. Here's an example:

`titanic[titanic['Fare'] < 50]]`

You can read that as: Select rows where the fare column is less than 50.

In [0]:
# Print some conditional selections!