# Indy Action Time

## Imports

In [1]:
# imports used in your project go here 
from modules.wrangle import (get_mac_data,
                             get_clean_mac,
                             X_y_split)

## Acquire
This data was downloaded as a csv file from [OpenIndy Data Portal](https://data.indy.gov/datasets/mayors-action-center-service-cases/explore) on July 11, 2023. There are 940,638 observations with each one representing a non-emergency service request or one case. There are a total of 15 columns (variables/features) with each representing information related to the cases. The column names are listed below:
* 'OBJECTID', 'CASENUMBER', 'SOURCE_ID__C', 'KEYWORD__C',
       'SUBCATEGORY__C', 'INCIDENT_ADDRESS__C', 'TOWNSHIP__C', 'CITY__C',
       'ZIP__C', 'COUNCIL_DISTRICT__C', 'CREATEDDATE', 'LASTMODIFIEDDATE',
       'CLOSEDDATE', 'STATUS', 'ORIGIN'
       
The data begins June 1, 2016 and ends August 9, 2022. An email was sent to Webmaster@indy.gov on August 15, 2023 to resolve the data breakage with no response.  A phone call was made Mayor’s Action Center at +1 317-327-4622 on August 15, 2023 about the data breakage requesting assistance. Request Number: 23-00109412 was create and closed without the data being updated.

In [8]:
# Import messy data
get_mac_data().head(2)

Unnamed: 0,OBJECTID,CASENUMBER,SOURCE_ID__C,KEYWORD__C,SUBCATEGORY__C,INCIDENT_ADDRESS__C,TOWNSHIP__C,CITY__C,ZIP__C,COUNCIL_DISTRICT__C,CREATEDDATE,LASTMODIFIEDDATE,CLOSEDDATE,STATUS,ORIGIN
0,498698727,22-010673,INV22-01609,Illegal Dumping and Junk/Trash,Trash Accumulation or Dumped Materials,1142 N GOODLET AVE,WAYNE,INDIANAPOLIS,46222,11.0,2022/01/31 15:25:28+00,2022/02/01 13:59:02+00,2022/02/01 13:59:02+00,Closed,RequestIndy Mobile
1,498698728,22-010026,A22-375189,Animal,Abuse,3501 RALSTON AVE,CENTER,INDIANAPOLIS,46218,9.0,2022/01/28 18:35:54+00,2022/01/29 03:43:27+00,2022/01/29 03:43:20+00,Closed,Phone


## Clean and Prepare
* The columns names were 
* List steps taken to clean your data here
* In particular call out how you handle null values and outliers in detail
* You must do this even if you do not do anything or do not encounter any
* Any time there is potential to make changes to the data you must be upfront about the changes you make or do not make

In [4]:
# Import your prepare function and use it to clean your data here

## Explore

* Here you will explore your data then highlight 4 questions that you asked of the data and how those questions influenced your analysis
* Remember to split your data before exploring how different variables relate to one another
* Each question should be stated directly 
* Each question should be supported by a visualization
* Each question should be answered in natural language
* Two questions must be supported by a statistical test, but you may choose to support more than two
* See the following example, and read the comments in the next cell

**The following empty code block** is here to represent the countless questions, visualizations, and statistical tests 
that did not make your final report. Data scientist often create a myriad of questions, visualizations 
and statistical tests that do not make it into the final notebook. This is okay and expected. Remember 
that shotgun approaches to your data such as using pair plots to look at the relationships of each feature 
are a great way to explore your data, but they have no place in your final report. 
**Your final report is about showing and supporting your findings, not showing the work you did to get there!**

## You may use this as a template for how to ask and answer each question:

### 1) Question about the data
* Ask a question about the data for which you got a meaningful result
* There is no connection can be a meaningful result

### 2) Visualization of the data answering the question

* Visualizations should be accompanied by take-aways telling the reader exactly what you want them to get from the chart
* You can include theses as bullet points under the chart
* Use your chart title to provide the main take-away from each visualization
* Each visualization should answer one, and only one, of the explore questions

### 3) Statistical test
* Be sure you are using the correct statistical test for the type of variables you are testing
* Be sure that you are not violating any of the assumptions for the statistical test you are choosing
* Your notebook should run and produce the results of the test you are using (This may be done through imports)
* Include an introduction to the kind of test you are doing
* Include the Ho and Ha for the test
* Include the alpha you are using
* Include the readout of the p-value for the test
* Interpret the results of the test in natural language (I reject the null hypothesis is not sufficient)

### 4) Answer to the question
* Answer the question you posed of the data by referring to the chart and statistical test (if you used one)
* If the question relates to drivers, explain why the feature in question would/wouldn't make a good driver

## Exploration Summary
* After your explore section, before you start modeling, provide a summary of your findings in Explore
* Include a summary of your take-aways
* Include a summary of the features you examined and weather or not you will be going to Modeling with each feature and why
* It is important to note which features will be going into your model so the reader knows what features you are using to model on

## Modeling

### Introduction
* Explain how you will be evaluating your models
* Include the evaluation metric you will be using and why you have chosen it
* Create a baseline and briefly explain how it was calculated 

In [3]:
# If you use code to generate your baseline run the code and generate the output here

Printout should read: <br>
Baseline: "number" "evaluation metric"

### Best 3 Models
* Show the three best model results obtained using your selected features to predict the target variable
* Typically students will show the top models they are able to generate for three different model types

## You may use this as a template for how to introduce your models:

### Model Type

In [4]:
# Code that runs the best model in that model type goes here 
# (This may be imported from a module)

Printout of model code should read: <br>
"Model Type" <br>
"evaluation metric" on train: "evaluation result" <br>
"evaluation metric" on validate: "evaluation result"

### Test Model
* Choose the best model out of the three as you best model and explain why you have chosen it
* Explain that you will now run your final model on test data to gauge how it will perform on unseen data

In [5]:
# Code that runs the best overall model on test data (this may be imported from a module)

Printout of model code should read: <br>
"Model Type" <br>
"evaluation metric" on Test: "evaluation result" <br>

### Modeling Wrap 
* Give a final interpretation of how the models test score compares to the baseline and weather you would recommend this model for production

## Conclusion

### Summery
* Summarize your findings and answer the questions you brought up in explore 
* Summarize how drivers discovered lead or did not lead to a successful model 

### Recommendations
* Recommendations are actions the stakeholder should take based on your insights

### Next Steps
* Next Steps are what you, as a Data Scientist, would do if provided more time to work on the project

**Where there is code in your report there should also be code comments telling the reader what each code block is doing. This is true for any and all code blocks even if you are using a function to import code from a module.**
<br>
<br>
**Your Notebook should contain adequate markdown that documents your thought process, decision making, and navigation through the pipeline. As a Data Scientist, your job does not end with making data discoveries. It includes effectively communicating those discoveries as well. This means documentation is a critical part of your job.**

# README

Your README should contain all of the following elements:

* **Title** Gives the name of your project
* **Project Description** Describes what your project is and why it is important 
* **Project Goal** Clearly states what your project sets out to do and how the information gained can be applied to the real world
* **Initial Hypotheses** Initial questions used to focus your project 
* **Project Plan** Guides the reader through the different stages of the pipeline as they relate to your project
* **Data Dictionary** Gives a definition for each of the features used in your report and the units they are measured in, if applicable
* **Steps to Reproduce** Gives instructions for reproducing your work. i.e. Running your notebook on someone else's computer.