The dataset used for this analysis is the San Francisco Crime Data from May 2017 to May 2018. The goal of this data story is to analyze the patterns of larceny and theft incidents in the city and identify any trends or changes over time.

The first visualization is a time-series bar chart showing the monthly counts of larceny and theft incidents. The chart reveals a clear pattern of seasonality,
 with higher counts during the summer months and lower counts during the winter months.
 The highest counts of incidents occurred in August and September, while the lowest counts occurred in December and January.
  Overall, the chart shows that larceny and theft incidents were more common during the warmer months of the year.

In [32]:
# Import necessary libraries
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource, HoverTool, Select
from bokeh.layouts import column
from bokeh.io import curdoc

In [33]:
# Load data
df = pd.read_csv('../weeksex/Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv', parse_dates=['Date', 'Time'],low_memory=False)

#data = pd.read_csv('../weeksex/Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv')


In [34]:

#df['Date'] = pd.to_datetime(df['Date'])
# slice the dataframe to get data between 2010 and 2017
#df = df[(df['Date'].dt.year >= 2017) & (df['Date'].dt.year <= 2018)]

# Filter data to May 2017 to May 2018 and larceny and theft incidents
df = df[(df['Date'] >= '2017-05-01') & (df['Date'] <= '2018-05-31') & ((df['Category'] == 'LARCENY/THEFT'))]
#focuscrimes = set(['LARCENY/THEFT'])
#df = df[df['Category']]

In [42]:


# Group data by month and count incidents
monthly_counts = df.groupby(pd.Grouper(key='Date', freq='M'))['Category'].count()
print(monthly_counts)




Date
2017-09-30    2
2017-10-31    2
Freq: M, Name: Category, dtype: int64


In [36]:
# Create a ColumnDataSource for the plot
source = ColumnDataSource(data=dict(x=monthly_counts.index.values, y=monthly_counts.values))

# Create the figure
p = figure(title='Monthly Larceny/Theft Incidents in San Francisco (May 2017 - May 2018)',
           x_axis_label='Month', y_axis_label='Incidents', x_axis_type='datetime',
           plot_width=800, plot_height=400)

# Add a line to the plot
line = p.line(x='x', y='y', source=source, line_width=2, line_color='blue')

# Add a hover tool to show incident count on hover
hover = HoverTool(tooltips=[('Incidents', '@y')], mode='vline')
p.add_tools(hover)

In [37]:
# Add a select widget to allow users to switch between line and bar charts
select = Select(title='Select Chart Type:', options=['Line', 'Bar'], value='Line')
def update_chart(attrname, old, new):
    if new == 'Line':
        line.visible = True
        bar.visible = False
    else:
        line.visible = False
        bar.visible = True
select.on_change('value', update_chart)

# Create a bar chart
bar_source = ColumnDataSource(data=dict(x=monthly_counts.index.values, top=monthly_counts.values))
bar = figure(title='Monthly Larceny/Theft Incidents in San Francisco (May 2017 - May 2018)',
             x_axis_label='Month', y_axis_label='Incidents', x_axis_type='datetime',
             plot_width=800, plot_height=400, visible=False)
#bar.vbar(x='x', top='top', width=pd.DateOffset(months=0.9), source=bar_source, color='blue')

In [38]:


# Combine line and bar charts into a single layout
layout = column(select, column(p, bar))

# Show the plot
curdoc().add_root(layout)
show(layout)


You are generating standalone HTML/JS output, but trying to use real Python
callbacks (i.e. with on_change or on_event). This combination cannot work.

Only JavaScript callbacks may be used with standalone output. For more
information on JavaScript callbacks with Bokeh, see:

    https://docs.bokeh.org/en/latest/docs/user_guide/interaction/callbacks.html

Alternatively, to use real Python callbacks, a Bokeh server application may
be used. For more information on building and running Bokeh applications, see:

    https://docs.bokeh.org/en/latest/docs/user_guide/server.html



In [43]:
import pandas as pd 
df = pd.read_csv("../weeksex/Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv",low_memory=False)
#df = pd.read_csv("../../datasat.csv",)
df['Date'] = pd.to_datetime(df['Date'])
# slice the dataframe to get data between 2010 and 2017
df = df[(df['Date'].dt.year >= 2010) & (df['Date'].dt.year <= 2017)]
focuscrimes = set(['LARCENY/THEFT'])
df = df[df['Category'].isin(focuscrimes)]


In [55]:
df['HourOfDay'] = df['Time'].str.strip().str[0:2]


In [56]:
# group the dataframe by "Category" and "HourOfDay", then calculate the count for each group
crime_hourly_counts = df.groupby(['Category', 'HourOfDay']).size().reset_index(name='count')

# calculate the total count for each crime category
crime_category_counts = df.groupby(['Category']).size().reset_index(name='total_count')

# merge the two dataframes to get the total count for each row
crime_hourly_counts = pd.merge(crime_hourly_counts, crime_category_counts, on='Category')

# calculate the hourly percentage of each crime type
crime_hourly_counts['hourly_percentage'] = crime_hourly_counts['count'] / crime_hourly_counts['total_count']


columns = ['Category', 'HourOfDay', 'hourly_percentage']
focusData =  pd.DataFrame(crime_hourly_counts, columns=columns)

print(focusData)

         Category HourOfDay  hourly_percentage
0   LARCENY/THEFT        00           0.044453
1   LARCENY/THEFT        01           0.026327
2   LARCENY/THEFT        02           0.015877
3   LARCENY/THEFT        03           0.009655
4   LARCENY/THEFT        04           0.006342
5   LARCENY/THEFT        05           0.006357
6   LARCENY/THEFT        06           0.009820
7   LARCENY/THEFT        07           0.015787
8   LARCENY/THEFT        08           0.027422
9   LARCENY/THEFT        09           0.032024
10  LARCENY/THEFT        10           0.037572
11  LARCENY/THEFT        11           0.043119
12  LARCENY/THEFT        12           0.056433
13  LARCENY/THEFT        13           0.050406
14  LARCENY/THEFT        14           0.051650
15  LARCENY/THEFT        15           0.058592
16  LARCENY/THEFT        16           0.061080
17  LARCENY/THEFT        17           0.068457
18  LARCENY/THEFT        18           0.080841
19  LARCENY/THEFT        19           0.074889
20  LARCENY/T

In [57]:
# Pivot the dataframe
pivoted_focusData = focusData.pivot_table(index='HourOfDay', columns='Category', values='hourly_percentage')

# Display the pivoted dataframe
print(pivoted_focusData)


Category   LARCENY/THEFT
HourOfDay               
00              0.044453
01              0.026327
02              0.015877
03              0.009655
04              0.006342
05              0.006357
06              0.009820
07              0.015787
08              0.027422
09              0.032024
10              0.037572
11              0.043119
12              0.056433
13              0.050406
14              0.051650
15              0.058592
16              0.061080
17              0.068457
18              0.080841
19              0.074889
20              0.064394
21              0.053779
22              0.055428
23              0.049296


In [58]:
from bokeh.models import ColumnDataSource,Legend
from bokeh.io import output_notebook, show
from bokeh.palettes import Category10
import seaborn as sns

source = ColumnDataSource(data=pivoted_focusData)
## it is a standard way to convert your df to bokeh
output_notebook()

In [59]:
# Define a figure with title and axis labels
p = figure(x_range=source.data['HourOfDay'], title="Hourly Percentage by Category",x_axis_label='Hour of the Day',width =1200)
colo = sns.color_palette('viridis', len(source.data['HourOfDay'])).as_hex()

In [60]:
bar ={} # to store vbars
items=[]


### here we will do a for loop:
for indx,category  in enumerate(pivoted_focusData.columns):
    bar[category] =p.vbar(x='HourOfDay', 
    top=category ,
    source=source,
    muted=True, 
    muted_alpha=0.05,
    fill_alpha=1.9,
    color=colo[indx],
    width=0.7)
    items.append((category, [bar[category]]))


In [61]:
legend = Legend(items=items)
p.add_layout(legend, 'left') 
p.legend.click_policy="mute" 
#test
### assigns the click policy (you can try to use ''hide'
show(p)


#displays your plot


In conclusion, the analysis of the San Francisco Crime Data from May 2017 to May 2018 reveals some interesting patterns and trends in larceny and theft incidents. The data suggests that these types of incidents are more common during the summer months, particularly in the downtown and tourist areas of the city. The interactive scatter plot also shows some interesting patterns in the day of the week and time of day of incidents, which may be useful for crime prevention efforts.
