**1) Using the documentation for Recursive Feature Selection, apply this process to the crime dataset to create the best model https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html . Since this dataset is so small, you do not need to perform a train-test split. You can select what you’re trying to predict. Be sure to explain what RFE is in the markdown.**



In [1]:
#Load the normal things
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Load the crime dataset from the csv file
crime_df =pd.read_csv("crime_data.csv")

#display the head of the dataframe
crime_df.head()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7
0,478,184,40,74,11,31,20
1,494,213,32,72,11,43,18
2,643,347,57,70,18,16,16
3,341,565,31,71,11,25,19
4,773,327,67,72,9,29,24


In [6]:
#Import the modules needed to perform the RFE, SVR, and multivariate linear regression

from sklearn.feature_selection import RFE
from sklearn.svm import SVR
from sklearn.linear_model import LinearRegression


In [8]:
#Set the estimator to linear
estimator = SVR(kernel="linear")

#Loop through n values for each column other than X1 to optimize the prediction model 
for n in range(1,6):
    X = crime_df.drop(['X1'], axis=1)
    y = crime_df['X1']
    selector = RFE(estimator, n_features_to_select=n, step=1)
    selector = selector.fit(X, y)
    
    #print out value of n, booleans, and selector rankings for each value of n
    print(n)
    print("Selector Support for:", selector.support_)
    ranking = selector.ranking_
    print("Selector Ranking", ranking)
    
    #initialize a list
    selected = []
    
    #Check each index in the selector ranking to see if it should be included in the multivariate linear regression
    if ranking[0] == 1:
        selected.append('X2')
    if ranking[1] == 1:
        selected.append('X3')
    if ranking[2] == 1:
        selected.append('X4')
    if ranking[3] == 1:
        selected.append('X5')
    if ranking[4] == 1:
        selected.append('X6')
    if ranking[5] == 1:
        selected.append('X7')
        
    #Print list to check that everything is working as expected
    print(selected)
    
    #Set new X value to include the selected columns from the RFE above
    Xi = crime_df[selected]
    
    #Y value doesn't change; we still want to consider the prediction of X1 column
    Y = crime_df['X1']
    
    #Perform multivariate linear regression based on results from above, including printing accuracy score and 
    #regression coefficients for each iteration
    regression = LinearRegression()
    regression.fit(Xi,y)
    y_pred = regression.predict(Xi)
    accuracy_score = regression.score(Xi, y)
    print("accuracy score:", accuracy_score)
    print("regression coefficients:", regression.coef_)
    

1
Selector Support for: [False False False  True False False]
Selector Ranking [6 2 3 1 4 5]
['X5']
accuracy score: 0.10401829368952764
regression coefficients: [15.73779528]
2
Selector Support for: [False  True False  True False False]
Selector Ranking [5 1 2 1 3 4]
['X3', 'X5']
accuracy score: 0.31139468387676095
regression coefficients: [10.19259525  8.452784  ]
3
Selector Support for: [False  True  True  True False False]
Selector Ranking [4 1 1 1 2 3]
['X3', 'X4', 'X5']
accuracy score: 0.327541924770107
regression coefficients: [11.28598289 -4.76074061  3.44072147]
4
Selector Support for: [False  True  True  True  True False]
Selector Ranking [3 1 1 1 1 2]
['X3', 'X4', 'X5', 'X6']
accuracy score: 0.3311804211127022
regression coefficients: [11.3285181  -4.27441845  6.28050796  1.58262326]
5
Selector Support for: [False  True  True  True  True  True]
Selector Ranking [2 1 1 1 1 1]
['X3', 'X4', 'X5', 'X6', 'X7']
accuracy score: 0.33360272236731714
regression coefficients: [10.980670

We can see that including more than 2 factors leads to diminishing returns in terms of improvement to model accuracy. Since each additional feature included also increases the complexity of the model, I would choose to just include 2 features (columns X3 and X5 in this case). An increase in accuracy score from 0.31 to 0.33 is not enough to justify the inclusion of more features and complexity.

**2. Create a function called digital_root that takes in an integer. Digital root is the recursive sum of all the digits in a number.
Given n, take the sum of the digits of n. If that value has more than one digit, continue reducing in this way until a single-digit number is produced. The input will be a non-negative integer.
Examples:
16 --> 1+6=7
942 --> 9+4+2=15 --> 1+5=6
132189 --> 1+3+2+1+8+9=24 --> 2+4=6 493193 --> 4+9+3+1+9+3=29 --> 2+9=11 --> 1+1 = 2**

In [None]:
#recursive: call a function--inside the function it will call itself

#Could turn into string and then back into a number

#How to break the numbers down into their component parts-every number can be divided 10 --example, 6 and 1 ten

In [55]:
def digital_root(n):
    #Check to make sure given n if positive
    if n<0:
        raise ValueError("Function can only accept non-negative integers")
    #Check to make sure integer vs. decimal value is given
    if isinstance(n, int) == False:
        raise ValueError("Function can only accept integers")
    #If previous conditions are met, continue on
    else:
        #Convert given value into a list of integers then check the length of the list
        digits = [int(x) for x in str(n)]
        sum_digits = n
        while len(digits) > 1:
            total = 0
            #Take the sum of the digits if there is more than 1 digit then convert to new list of integers and check length
            sum_digits = sum(digits)
            sum_digits_str = [int(y) for y in str(sum_digits)]
            #If the sum is only 1 digit long, break out of the loop; otherwise loop will return to the beginning
            if len(sum_digits_str) == 1:
                   break
            digits = sum_digits_str
        #Once conditions are met, return the final sum of digits that contains only 1 digit
        return ("digital_root = ", sum_digits)
            



#Test Cases

#Non-positive integer
#digital_root(-10)

#Non-integer
#digital_root(12.51)

#Given in question:
#digital_root(16)
#digital_root(942)
#digital_root(132189)
#digital_root(493193)