<h1> Data Visualization Project </h1>

<h3>Part 1: Analysis of Audience Interest in Data Science Topics</h3>

This part explores the data obtained from a survey wherein audience members were asked about their interests in different Data Science topics. 

In [None]:
#importing required libraries
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Folium installed and imported!')

Solving environment: / 

In [None]:
#importing survey results
df1 = pd.read_csv("https://cocl.us/datascience_survey_data", index_col=0)
df1

In [None]:
#sort in descending order of Very Interested
df1.sort_values(by='Very interested', ascending=False, axis=0, inplace=True)

#formatting the data - represent all numbers as percentages
df1["Very interested"] = round(df1["Very interested"]*100/2233, 2)
df1["Somewhat interested"] = round(df1["Somewhat interested"]*100/2233, 2)
df1["Not interested"] = round(df1["Not interested"]*100/2233, 2)
df1

In [None]:
#plotting the chart using artist layer
ax = df1.plot.bar(
                    width=0.8, 
                    figsize=(20,8),
                    color = ['#5cb85c', '#5bc0de', '#d9534f'],
                    fontsize = 14
                    )
ax.set_title('Percentage of Respondents\' Interest in Data Science Areas', size=16)

#improving presentation
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.get_yaxis().set_visible(False)

#displaying percentages above bars
for index, value in enumerate(df1['Very interested']):
    label = str(value) + "%"
    ax.annotate(label, xy=(index-0.44, value+0.5), size=14)
for index, value in enumerate(df1['Somewhat interested']):
    label = str(value) + "%"
    ax.annotate(label, xy=(index-0.13, value+0.5), size=14)
for index, value in enumerate(df1['Not interested']):
    label = str(value) + "%"
    ax.annotate(label, xy=(index+0.15, value+0.5), size=14)
    
ax.legend(fontsize=14)
plt.show()

<h3> Part 2: Crime Rates in San Francisco </h3>

This part examines the crimes that have occured in San Francisco, US, and plots a choropleth map to depict the frequency of crimes in different districts of the state.

In [None]:
#importing dataset
df2 = pd.read_csv("https://cocl.us/sanfran_crime_dataset")
df2.head()

In [None]:
#cleaning the data
df2.drop(['IncidntNum', 'Category', 'Descript', 'DayOfWeek','Date','Time','Resolution','Address','X','Y','Location'], axis=1, inplace=True)
df2.rename(columns={'PdDistrict':'Neighborhood','PdId':'Count'}, inplace=True)
df2.head()

In [None]:
df2['Neighborhood'].value_counts()

In [None]:
df2.groupby('Neighborhood').count()

In [None]:
df2.columns

In [None]:
#downloading San Francisco GeoJSON file
!wget --quiet https://cocl.us/sanfran_geojson
    
print('GeoJSON file downloaded!')

In [None]:
sanfran_geo = r'sanfran_geojson'

In [None]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

# create a SF map
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

In [None]:
#creating a chloropleth map depicting the crime rates
sanfran_map.choropleth(
        geo_data = sanfran_geo,
        data = df2,
        columns = ['Neighborhood', 'Count'],
        key_on = 'feature.properties.DISTRICT',
        fill_color = 'YlOrRd',
        fill_opacity = 0.7,
        line_opacity = 0.2,
        legend_name = 'Crime Rate in San Francisco',
        reset = True
)
sanfran_map