## Purpose

This homework is designed to give you practice with scikitlearn.  Please note that this is **NOT** a machine learning course.  Using the library the important part, not designing 'good' models.  The requirements are fairly low on this.

## Requirements

This is a group assignment.  Take a data set (either one provided, or using your group project data set) and work with Scikit Learn to train some aspect of your data set.

Some data sets may appear to be something you wouldn't use ML to solve in a 'real life' situation, but this again is just for practice.  So the models may not come out useful, and that's okay.

Each student in the group should do 2 ML type implementations using Scikit learn.  Since there are likely less applicable algorithms than there are implementations, work at looking at different slices of information (See help video).


## Required Hand-in

One notebook should be handed in.  Following best practices I've outlined.  This homework is graded as a group homework.  The data set you pick to do this practice can be either one I'm providing as part of the repo, or of your group project.

Please label each implementation with the original author (in code, comment above the implementation).

Do not use the .todo as your template.  Analysis of the models performance should be minimal (see one example on block 10 on https://github.com/TheDarkTrumpet/BAIS-6040-0EXP-Sum2021/blob/master/Notebooks/02-Analysis/09.03.01-Classification.ipynb ).

I do recommend that you lean on whoever in your group has a bit more knowledge of ML concepts. to pick the implementation that appears to yield the best results.  If you're using your group data set, this implementation can then be copied/pasted into the group project.

## Other notes

This homework will be graded as a group.  Meaning, you all will get the same grade, regardless if a specific student's implementation is poorly done.  It will count for 75 points.  I strongly recommend you discuss as a group who will do what, then meet up at least a few days before the assignment is to be turned in and do a code review and merge of the individual notebooks.

In [1]:
import pandas as pd
import numpy as np
from pytrends.request import TrendReq
from seaborn import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

In [2]:
pytrends = TrendReq(hl='en-US', tz=360)

#build list of keywords in this case only use Teslas
kw_list = ["Tesla"] 

# build the payload
pytrends.build_payload(kw_list, timeframe='2021-04-01 2021-06-30', geo='US')

# store interest over time information in df and rename Tesla column to Search Interest
teslaTrendsdf = pytrends.interest_over_time()
teslaTrendsdf = teslaTrendsdf.rename(columns={'Tesla': 'Search Interest'})
#telsaStockdf.set_index('date': 'Date', inplace=True)
teslaTrendsdf.reset_index(inplace=True, drop=True)
teslaTrendsdf

Unnamed: 0,Search Interest,isPartial
0,80,False
1,75,False
2,66,False
3,69,False
4,87,False
...,...,...
86,67,False
87,63,False
88,69,False
89,67,False


In [3]:
telsaStockdf = pd.read_csv("https://raw.githubusercontent.com/atlas125gev/StockProject/main/Andrew/DOGE-USD.csv")
#telsaStockdf.set_index('Date', inplace=True)
telsaStockdf

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2021-04-01,0.053655,0.070111,0.053644,0.061986,0.061986,5816046822
1,2021-04-02,0.061968,0.062249,0.057333,0.057664,0.057664,2166925111
2,2021-04-03,0.057658,0.059484,0.055804,0.055804,0.055804,1136931403
3,2021-04-04,0.055776,0.058107,0.055295,0.057404,0.057404,938035097
4,2021-04-05,0.057411,0.060153,0.056435,0.059696,0.059696,1513832721
...,...,...,...,...,...,...,...
86,2021-06-26,0.237673,0.255127,0.230972,0.244784,0.244784,2649457302
87,2021-06-27,0.246045,0.266891,0.240894,0.264450,0.264450,2167521670
88,2021-06-28,0.264918,0.266982,0.250762,0.256857,0.256857,1932994784
89,2021-06-29,0.257061,0.274940,0.252988,0.262769,0.262769,2192562738


In [6]:
mergedStockPrice = pd.concat([telsaStockdf, teslaTrendsdf], axis=1)
mergedStockPrice.reset_index(inplace=True, drop=True)
mergedStockPrice.set_index('isPartial', drop=True)

meanSearchInterest = mergedStockPrice['Search Interest'].mean()
mergedStockPrice["Interest Points Away From Mean"] = mergedStockPrice["Search Interest"] - meanSearchInterest
mergedStockPrice["Price Increase Points"] = mergedStockPrice["Open"] - mergedStockPrice["Close"]
mergedStockPrice["Price Increase"] = mergedStockPrice["Price Increase Points"] > 0.0

mergedStockPrice

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Search Interest,isPartial,Interest Points Away From Mean,Price Increase Points,Price Increase
0,2021-04-01,0.053655,0.070111,0.053644,0.061986,0.061986,5816046822,80,False,8.417582,-0.008331,False
1,2021-04-02,0.061968,0.062249,0.057333,0.057664,0.057664,2166925111,75,False,3.417582,0.004304,True
2,2021-04-03,0.057658,0.059484,0.055804,0.055804,0.055804,1136931403,66,False,-5.582418,0.001854,True
3,2021-04-04,0.055776,0.058107,0.055295,0.057404,0.057404,938035097,69,False,-2.582418,-0.001628,False
4,2021-04-05,0.057411,0.060153,0.056435,0.059696,0.059696,1513832721,87,False,15.417582,-0.002285,False
...,...,...,...,...,...,...,...,...,...,...,...,...
86,2021-06-26,0.237673,0.255127,0.230972,0.244784,0.244784,2649457302,67,False,-4.582418,-0.007111,False
87,2021-06-27,0.246045,0.266891,0.240894,0.264450,0.264450,2167521670,63,False,-8.582418,-0.018405,False
88,2021-06-28,0.264918,0.266982,0.250762,0.256857,0.256857,1932994784,69,False,-2.582418,0.008061,True
89,2021-06-29,0.257061,0.274940,0.252988,0.262769,0.262769,2192562738,67,False,-4.582418,-0.005708,False


In [7]:
mergedStockPrice.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 91 entries, 0 to 90
Data columns (total 12 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Date                            91 non-null     object 
 1   Open                            91 non-null     float64
 2   High                            91 non-null     float64
 3   Low                             91 non-null     float64
 4   Close                           91 non-null     float64
 5   Adj Close                       91 non-null     float64
 6   Volume                          91 non-null     int64  
 7   Search Interest                 91 non-null     int64  
 8   isPartial                       91 non-null     bool   
 9   Interest Points Away From Mean  91 non-null     float64
 10  Price Increase Points           91 non-null     float64
 11  Price Increase                  91 non-null     bool   
dtypes: bool(2), float64(7), int64(2), obje