# Introduction

In this section, we'll create summaries and visualizations to explore the data.

**Note:** Notebooks with exploratory visualizations have been broken up into multiple sections to reduce file size. Dependencies required across all EDA notebooks are imported below. To view animated plots that do not render on GitHub, enter the URL into [Jupyter Notebook Viewer](https://nbviewer.jupyter.org/).

# Loading Dependencies

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import re
import string

import nltk
from nltk.corpus import stopwords
from nltk import word_tokenize
from nltk.stem import WordNetLemmatizer 
from nltk.collocations import *
from nltk import FreqDist
from nltk.probability import FreqDist
from os import path
from PIL import Image
import matplotlib.pyplot as plt
import os
from wordcloud import WordCloud, STOPWORDS

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import random
import plotly.io as pio

 #display plotly figures
pio.renderers.default = "plotly_mimetype+notebook_connected"

In [2]:
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [5]:
from google.colab import drive
drive.mount("/content/drive")
path = '/content/drive/MyDrive/Colab Notebooks/community_board_311.csv'
df = pd.read_csv(path, index_col=0)

Mounted at /content/drive



elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison



# Value Count Bar Plots for Complaint Type and Agency 

We'll use Plotly horizontal bar charts to visualize the distribution of calls across different complaint and agency categories. The top 30 most frequent complaint types and their corresponding counts are assigned to y and x lists, respectively. The lists are reversed so that the most frequent complaint types appear at the top of the chart. Then, the lists are converted to a dataframe and plotted with ```plotly.express.bar```.

In [6]:
complaint_type_x = list(df["complaint_type"].value_counts()[0:30])
complaint_type_y = list(df["complaint_type"].value_counts()[0:30].index)

complaint_type_x.reverse()
complaint_type_y.reverse()

complaint_type_count_df = pd.DataFrame({"Total Calls": complaint_type_x, "Complaint Type": complaint_type_y})

complaint_type_fig = px.bar(complaint_type_count_df, x="Total Calls", y="Complaint Type", 
             orientation='h',
             height=700, color = "Total Calls",
             color_discrete_sequence = px.colors.sequential.thermal)

complaint_type_fig.update_layout(hovermode='x',
                  title="Top 30 Most Frequent 311 Complaint Types",
                  font=dict(family="silom",
                  size=14, color="#58508d"))

complaint_type_fig.show()

Complaints related to noise, illegal parking, damaged tress, sanitation, and utilities are the most common. The same process is repeated below to create a bar chart for the agency value counts.

In [7]:
agency_x = list(df["agency"].value_counts())
agency_y = list(df["agency"].value_counts().index)

agency_x.reverse()
agency_y.reverse()

agency_count_df = pd.DataFrame({"Total Calls": agency_x, "Agency": agency_y})

agency_fig = px.bar(agency_count_df, x="Total Calls", y="Agency", 
             orientation='h',
             height=700, color = "Total Calls",
             color_discrete_sequence = px.colors.sequential.thermal)

agency_fig.update_layout(hovermode='x',
                  title="311 Call Counts by Agency",
                  font=dict(family="silom",
                  size=14, color="#58508d"))

agency_fig.show()

The New York Police Department (NYPD) responds to the majority of 311 calls in New York City. Conversely, the Dpt of Ed (DOE) and 
Dpt of Information Technology & Telecommunications (DOITT), respond to so few calls that their count does not appear on the above plot's hover data. In the modeling phase, these agencies will be oversampled so that the models will have enough data to recognize relevant calls.

# References
[Bar Charts in Python](https://plotly.com/python/bar-charts/)