# **<span style="color: red;">REGRESSION TASK</span>**

# 1. Violent Crime Rate California 2000-2013
---
***(https://catalog.data.gov/dataset/violent-crime-rate-94cb9)***

## 1.1 Importing Libraries
---
Mandatory for starting the regression task.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

## 1.2 Print
---
Printing the .csv file to check for errors

In [2]:
data = pd.read_csv("/kaggle/input/california-crimerate/violent-crime-rate-california-2000-2013.csv")
print(data)

             ind_id                                 ind_definition  \
0               752  Number of Violent Crimes per 1,000 Population   
1               752  Number of Violent Crimes per 1,000 Population   
2               752  Number of Violent Crimes per 1,000 Population   
3               752  Number of Violent Crimes per 1,000 Population   
4               752  Number of Violent Crimes per 1,000 Population   
...             ...                                            ...   
49222           752  Number of Violent Crimes per 1,000 Population   
49223           752  Number of Violent Crimes per 1,000 Population   
49224           752  Number of Violent Crimes per 1,000 Population   
49225           752  Number of Violent Crimes per 1,000 Population   
49226  END OF TABLE                                            NaN   

       reportyear  race_eth_code race_eth_name geotype  geotypevalue  \
0          2000.0            9.0         Total      CA           6.0   
1          2000

  data = pd.read_csv("/kaggle/input/california-crimerate/violent-crime-rate-california-2000-2013.csv")
  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


## 1.3 Checking Column Names
---
To begin cleaning data to fix errors, the column names should be known.

In [3]:
print(data.columns)

Index(['ind_id', 'ind_definition', 'reportyear', 'race_eth_code',
       'race_eth_name', 'geotype', 'geotypevalue', 'geoname', 'county_fips',
       'county_name', 'region_code', 'region_name', 'strata_name_code',
       'strata_name', 'strata_level_name_code', 'strata_level_name',
       'numerator', 'denominator', 'rate', 'll_95ci', 'ul_95ci', 'se', 'rse',
       'ca_decile', 'ca_rr', 'dof_population', 'version'],
      dtype='object')


## 1.4 Cleaning
---
Records without values are causing errors to appear in section **1.2 Print**.

In [4]:
data.dropna(inplace=True)
print(data)
print() # space
data.describe() # check to see if any more data needs to be dropped
droplist = ['ind_id', 'race_eth_code', 'race_eth_name', 'geotype', 'geotypevalue', 'county_fips', 'region_code', 'strata_name_code', 'strata_level_name_code']
data.drop(droplist, axis=1, inplace=True)

      ind_id                                 ind_definition  reportyear  \
300      752  Number of Violent Crimes per 1,000 Population      2000.0   
305      752  Number of Violent Crimes per 1,000 Population      2000.0   
310      752  Number of Violent Crimes per 1,000 Population      2000.0   
316      752  Number of Violent Crimes per 1,000 Population      2000.0   
321      752  Number of Violent Crimes per 1,000 Population      2000.0   
...      ...                                            ...         ...   
49131    752  Number of Violent Crimes per 1,000 Population      2013.0   
49139    752  Number of Violent Crimes per 1,000 Population      2013.0   
49144    752  Number of Violent Crimes per 1,000 Population      2013.0   
49149    752  Number of Violent Crimes per 1,000 Population      2013.0   
49154    752  Number of Violent Crimes per 1,000 Population      2013.0   

       race_eth_code race_eth_name geotype  geotypevalue            geoname  \
300              9.0

## 1.5 Checking

In [5]:
print(data)
data.head()

                                      ind_definition  reportyear  \
300    Number of Violent Crimes per 1,000 Population      2000.0   
305    Number of Violent Crimes per 1,000 Population      2000.0   
310    Number of Violent Crimes per 1,000 Population      2000.0   
316    Number of Violent Crimes per 1,000 Population      2000.0   
321    Number of Violent Crimes per 1,000 Population      2000.0   
...                                              ...         ...   
49131  Number of Violent Crimes per 1,000 Population      2013.0   
49139  Number of Violent Crimes per 1,000 Population      2013.0   
49144  Number of Violent Crimes per 1,000 Population      2013.0   
49149  Number of Violent Crimes per 1,000 Population      2013.0   
49154  Number of Violent Crimes per 1,000 Population      2013.0   

                 geoname     county_name          region_name  \
300        Adelanto city  San Bernardino  Southern California   
305    Agoura Hills city     Los Angeles  Southern Ca

Unnamed: 0,ind_definition,reportyear,geoname,county_name,region_name,strata_name,strata_level_name,numerator,denominator,rate,ll_95ci,ul_95ci,se,rse,ca_decile,ca_rr,dof_population,version
300,"Number of Violent Crimes per 1,000 Population",2000.0,Adelanto city,San Bernardino,Southern California,Type of violent crime,Violent crime total,119.0,18130.0,6.563707,5.384386,7.743027,0.601694,9.166985,3.0,1.055683,18130.0,10/21/2015
305,"Number of Violent Crimes per 1,000 Population",2000.0,Agoura Hills city,Los Angeles,Southern California,Type of violent crime,Violent crime total,36.0,20537.0,1.752934,1.180309,2.325559,0.292156,16.666667,9.0,0.281936,20537.0,10/21/2015
310,"Number of Violent Crimes per 1,000 Population",2000.0,Alameda city,Alameda,Bay Area,Type of violent crime,Violent crime total,302.0,72259.0,4.17941,3.708034,4.650786,0.240498,5.754353,5.0,0.672201,72259.0,10/21/2015
316,"Number of Violent Crimes per 1,000 Population",2000.0,Albany city,Alameda,Bay Area,Type of violent crime,Violent crime total,88.0,16444.0,5.351496,4.233372,6.46962,0.570471,10.660036,4.0,0.860715,16444.0,10/21/2015
321,"Number of Violent Crimes per 1,000 Population",2000.0,Alhambra city,Los Angeles,Southern California,Type of violent crime,Violent crime total,253.0,85804.0,2.94858,2.585244,3.311917,0.185376,6.286946,7.0,0.474239,85757.0,10/21/2015
