# Use the `pandas` module to access data from a variety of sources

In this example we're fetching a `.xlsx` file from the web, but you can also load `.csv` and other file types from disk.

By convention, `pandas` is often imported "`as pd`" and once you do this you utilize the library using `pd.function()`

In [1]:
import pandas as pd

### Define a URL or filepath and use `pd.read_csv()` or `pd.read_excel()`

By convention, the `DataFrame` created by `pandas` is stored in an object called `df`

This convention is only applicable if you have a single dataframe in use. If you're working with multiple dataframes you'll need to find a better naming approach.

In [2]:
url = 'https://www1.nyc.gov/assets/nypd/downloads/excel/analysis_and_planning/stop-question-frisk/sqf-2018.xlsx'

df = pd.read_excel(url)

### Once you've created a dataframe you can use any `pandas` operation on it

#### Common initial steps include looking at the `.head()` or `.tail()` of the dataframe (i.e. first or last 5 rows of data)

In [3]:
df.head()

Unnamed: 0,STOP_FRISK_ID,STOP_FRISK_DATE,Stop Frisk Time,YEAR2,MONTH2,DAY2,STOP_WAS_INITIATED,RECORD_STATUS_CODE,ISSUING_OFFICER_RANK,ISSUING_OFFICER_COMMAND_CODE,...,STOP_LOCATION_SECTOR_CODE,STOP_LOCATION_APARTMENT,STOP_LOCATION_FULL_ADDRESS,STOP_LOCATION_PREMISES_NAME,STOP_LOCATION_STREET_NAME,STOP_LOCATION_X,STOP_LOCATION_Y,STOP_LOCATION_ZIP_CODE,STOP_LOCATION_PATROL_BORO_NAME,STOP_LOCATION_BORO_NAME
0,1,2018-01-01,19:04:00,2018,January,Monday,Based on C/W on Scene,APP,POM,1,...,G,(null),VARICK STREET && FRANKLIN STREET,(null),VARICK STREET,982327,201274,(null),PBMS,MANHATTAN
1,2,2018-01-01,23:00:00,2018,January,Monday,Based on Radio Run,APP,POM,34,...,C,(null),DYCKMAN STREET && POST AVENUE,(null),DYCKMAN STREET,1004892,253548,(null),PBMN,MANHATTAN
2,3,2018-01-01,23:55:00,2018,January,Monday,Based on Radio Run,APP,POM,808,...,B,4M,2245 RANDALL AVENUE,(null),RANDALL AVENUE,1026706,237776,(null),PBBX,BRONX
3,4,2018-01-01,03:23:00,2018,January,Monday,Based on Radio Run,APP,POM,63,...,B,(null),EAST 38 STREET && AVENUE L,(null),EAST 38 STREET,1001347,166195,(null),PBBS,BROOKLYN
4,5,2018-01-01,03:23:00,2018,January,Monday,Based on Radio Run,APP,POM,63,...,B,(null),EAST 38 STREET && AVENUE L,(null),EAST 38 STREET,1001347,166195,(null),PBBS,BROOKLYN


#### You can use `df.columns` or `df.dtypes` to learn more about the structure of your data

In [4]:
df.dtypes

STOP_FRISK_ID                              int64
STOP_FRISK_DATE                   datetime64[ns]
Stop Frisk Time                           object
YEAR2                                      int64
MONTH2                                    object
                                       ...      
STOP_LOCATION_X                            int64
STOP_LOCATION_Y                            int64
STOP_LOCATION_ZIP_CODE                    object
STOP_LOCATION_PATROL_BORO_NAME            object
STOP_LOCATION_BORO_NAME                   object
Length: 83, dtype: object

#### It can also be helpful sometimes to rotate your data by 90 degrees

This is done using `df.T` to "transpose" the data

In [5]:
df.head().T

Unnamed: 0,0,1,2,3,4
STOP_FRISK_ID,1,2,3,4,5
STOP_FRISK_DATE,2018-01-01 00:00:00,2018-01-01 00:00:00,2018-01-01 00:00:00,2018-01-01 00:00:00,2018-01-01 00:00:00
Stop Frisk Time,19:04:00,23:00:00,23:55:00,03:23:00,03:23:00
YEAR2,2018,2018,2018,2018,2018
MONTH2,January,January,January,January,January
...,...,...,...,...,...
STOP_LOCATION_X,982327,1004892,1026706,1001347,1001347
STOP_LOCATION_Y,201274,253548,237776,166195,166195
STOP_LOCATION_ZIP_CODE,(null),(null),(null),(null),(null)
STOP_LOCATION_PATROL_BORO_NAME,PBMS,PBMN,PBBX,PBBS,PBBS
