# Operations

There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:

### Import `pandas` Library

In [0]:
import pandas as pd

## Load Data

<hr>

##### Mount Drive - **Google Colab Only Step**

When using google colab in order to access files on our google drive we need to mount the drive by running the below python cell, then clicking the link it generates and pasting the code in the cell.



In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


Change Directory To Access The Dependent Files - **Google Colab Only Step**

In [0]:
directory = "teacher"
if (directory == "student"):
  %cd drive/Colab\ Notebooks/intro-to-python/
else:
  %cd drive/Shared\ drives/Rubrik/Data\ Science\ Track/intro-to-python

[Errno 2] No such file or directory: 'drive/Shared drives/Rubrik/Data Science Track/intro-to-python'
/content/drive/Shared drives/Rubrik/Data Science Track/intro-to-python


#### Load data into a variable called `df`



```python
# I've given you the path this time!
df = pd.read_csv('./data/rhode-island-police-stops.csv')
```

In [0]:
df = pd.read_csv('./data/rhode-island-police-stops.csv')

  interactivity=interactivity, compiler=compiler, result=result)


<hr>
<br>
<br>

## Unique Values
Unique Values are important to know, primarily for `object` or `categorical` datatypes. If you have a column called race, you'd want to know a few things.

1) How many unique races are contained in the dataset

2) What unique races are contained in the dataset

3) How many rows belong to each race.

```python
# returns the number of unique categories in `driver_race`
df['driver_race'].nunique()
```

In [0]:
# returns the number of unique categories in `driver_race`
df['driver_race'].nunique()

5

```python
# returns a list of the unique categories in 'driver_race'
df['driver_race'].unique()
```

In [0]:
# returns a list of the unique categories in 'driver_race'
df['driver_race'].unique()

array(['White', nan, 'Black', 'Hispanic', 'Asian', 'Other'], dtype=object)

```python
# returns the number of rows that belong to each category.
df['driver_race'].value_counts()
```

In [0]:
# returns the number of rows that belong to each category.
df['driver_race'].value_counts()

White       344734
Black        68579
Hispanic     53125
Asian        12826
Other         1344
Name: driver_race, dtype: int64

### After running `.value_counts()` on `driver_race` column, Answer the following questions.

<hr>

#### 1) What is the most common `race` in the dataframe?

<hr>

**Double-click** me to fill in text answer here!

<hr>

#### 2) What is the least common `race` in the dataframe?

<hr>

**Double-click** me to fill in text answer here!

<hr>

#### 3) How many `Hispanic` drivers in the dataframe? 

<hr>

**Double-click** me to fill in text answer here!

<hr>
<br>
<br>

## Sorting and Ordering a DataFrame

Look at the first 10 rows of `df` and compare them to the first 10 rows of the sorted `df` below.

```python
df.head()
```

In [0]:
df.head()

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
0,2005-01-02 01:55:00,600,M,1985.0,20.0,White,Speeding,False,,False,Citation,False,0-15 Min,False,False,Zone K1
1,2005-01-02 20:30:00,500,M,1987.0,18.0,White,Speeding,False,,False,Citation,False,16-30 Min,False,False,Zone X4
2,2005-01-04 11:30:00,0,,,,,,False,,False,,,,,False,Zone X1
3,2005-01-04 12:55:00,500,M,1986.0,19.0,White,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4
4,2005-01-06 01:30:00,500,M,1978.0,27.0,Black,Equipment,False,,False,Citation,False,0-15 Min,False,False,Zone X4


Will order all rows in the dataframe by `driver_age`, then only list the first 10!

```python
df.sort_values(by='driver_age').head(10)
```

In [0]:
df.sort_values(by='driver_age').head(10)

Unnamed: 0,date_and_time,police_department,driver_gender,driver_age_raw,driver_age,driver_race,violation,search_conducted,search_type,contraband_found,stop_outcome,is_arrested,stop_duration,out_of_state,drugs_related_stop,district
144564,2008-05-25 01:06:00,500,F,1993.0,15.0,Hispanic,Registration/plates,True,Incident to Arrest,True,Arrest Driver,True,0-15 Min,False,False,Zone X4
98159,2007-06-11 12:30:00,200,M,1992.0,15.0,White,Speeding,False,,False,Citation,False,0-15 Min,True,False,Zone X3
418834,2014-02-19 22:01:00,200,M,1999.0,15.0,White,Moving violation,False,,False,Citation,False,30+ Min,False,False,Zone X3
266303,2011-02-25 15:47:00,500,F,1996.0,15.0,White,Registration/plates,False,,False,Citation,False,0-15 Min,False,False,Zone K2
299666,2011-11-15 00:18:00,600,M,1996.0,15.0,Hispanic,Moving violation,False,,False,Arrest Driver,True,30+ Min,False,False,Zone K1
255144,2010-11-10 19:39:00,300,F,1995.0,15.0,White,Moving violation,False,,False,Arrest Driver,True,0-15 Min,False,False,Zone K3
255146,2010-11-10 19:39:00,200,F,1995.0,15.0,White,Moving violation,False,,False,Citation,False,16-30 Min,False,False,Zone X3
101424,2007-07-04 00:40:00,300,F,1992.0,15.0,White,Moving violation,False,,False,Arrest Driver,True,0-15 Min,False,False,Zone K3
68780,2006-12-07 03:35:00,500,M,1991.0,15.0,White,Moving violation,False,,False,Citation,False,0-15 Min,False,False,Zone X4
57818,2006-09-30 01:30:00,500,M,1991.0,15.0,Black,Moving violation,False,,False,Arrest Driver,True,30+ Min,False,False,Zone X4
