# DiRienzo iSchool Demo Lesson
## Learning data import, cleaning, exploratory data anlysis and visual analytics through the Tucson Crime Dataset

### Please work in a copy of this notebook along side the lesson [here](https://dirienzo-demo-lesson.appspot.com/d-demo-lesson/unit?unit=1&lesson=4).



#Importing your data

##Learning objectives
- Be able to import libraries and understand the function of aliases
- Be able to import comma separated value files from the internet
- Use Python features to understand the layout of your data frame
- Make decisions on if you need to sample your data to ease processing


In [0]:
# importing libraries that we will use
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [0]:
crime = pd.read_csv('https://docs.google.com/spreadsheets/d/1OK77WfmiwLF6IEMAOS5bJbAlhzcqPUrXcHTx3NaU7wc/gviz/tq?tqx=out:csv')

In [0]:
# check the shape
crime.shape

In [0]:
# Let's look at the top 10 rows of data
crime.head(10)

In [0]:
# how would you look at the last 10 rows of data?  What method would you use?
crime.______(10)

# Data cleaning and exploratory data analysis

## Learning objectives
- View summary statistics associated with your data and understand the parameters within
- Apply methods to check for NaN values and be able to make decisions on whether to delete or fill them
- Look for mislabeled or incorrect data points and be able to either fix or delete them
- Assess whole data set continuity
- Be able to summarize/visualize specific groups of data - e.g. counts of events during specific years

In [0]:
# first, describe the data
crime.describe(include = 'all')

##Dealing with NaN values

In [0]:
# are there nan values?
crime.isnull().sum()

In [0]:
# drop nan values
crime.dropna(inplace = True)

## Exploring / changing strange values

In [0]:
# what are the unique levels in the day_of_week column?
crime.day_of_week.unique()

In [0]:
# replace Moonday with Monday
crime.day_of_week.replace('Moonday', 'Monday', inplace = True)

In [0]:
# write the code to change Thursdayy to Thursday

In [0]:
crime.hour_of_day.plot(kind = 'hist')
plt.show()

In [0]:
# remove 26 hour data point 
crime = crime[crime.hour_of_day != 26]

##Converting data types

In [0]:
# view datatypes
crime.dtypes

In [0]:
# let's make a date column as a specific datetime object
crime['date'] = pd.to_datetime(crime.incident_datetime, infer_datetime_format = True)

In [0]:
# set date as the index
crime.set_index('date', inplace = True)

## Considering the whole data set

In [0]:
crime.index.year.value_counts()

In [0]:
# use operators to get just years 2010-2016
crime = crime.loc[(crime.index.year <= 2016) & (crime.index.year >= 2010)]


In [0]:
# how do we filter by specific months. fill in the '-' blanks
crime_month = crime.loc[(crime.____.____ _= _) _ (crime.____.____ _= _)]

# this will check the number of unique values, which should be between 6 and 9
crime_month.index.month.unique()

## Some Final EDA

In [0]:
# Let's look at the unique values present in parent_incident_type column.
# Fill in this code to get the unique values
crime.__________.unique()

In [0]:
# plot out the number of counts of each call type
crime.________________.___________.plot(kind = '_______')
plt.show()

# Visual Analytics

## Learning objectives
- Be able to graph data over different timescales
- Understand how different timescales provide different types of information
- Be able to create and visualize specific subsets of data
- Develop hypotheses based on visual results
    - Create additional figures to 'test' your hypotheses'
- Understand the limits to visual approaches

In [0]:
# let's plot the number of reports for each day of the week
crime.index.weekday.value_counts().sort_index().plot()
plt.show()

In [0]:
# let's add some extra detail to make our plot more useful
crime.index.weekday.value_counts().sort_index().plot()
plt.xlabel('day of week : Monday = 0')
plt.ylabel('number of police incidents')
plt.title('Number of crimes over the week')
plt.show()

### This is weird - Sunday makes sense, but why would Saturday be so low?  
# maybe the type of crime differs

In [0]:
# incident types
crime.parent_incident_type.unique()

In [0]:
# create violent crimes data frame
violent = ['Assault' , 'Homicide', 'Robbery', 'Sexual Assault', 'Assualt with Deadly Weapon', 'Disorder']
crime_violent = crime[crime.parent_incident_type.isin(violent)]


In [0]:
# create a non-violent data frame
non_violent = [ 'Drugs', 'Family Offense', 'Liquor', 'Missing Person', 'Pedestrian Stop', 'Property Crime', 'Quality of Life', 'Theft', 'Theft from Vehicle', 'Theft of Vehicle']
crime_non_violent = ______


In [0]:
# create a plot of violent crimes

In [0]:
#create a plot of non-violent crimes

## Plots can be deceiving

In [0]:
# Make the overall crime over the week plot again
crime.index.weekday.value_counts().sort_index().plot()
plt.xlabel('day of week : Monday = 0')
plt.ylabel('number of police incidents')
plt.title('Number of crimes over the week')
plt.show()

In [0]:
# now make the same plot but as a bar graph
