<a href="https://colab.research.google.com/github/AdrianDiez/Maingear-Estimator/blob/main/Maingear_Estimates.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Maingear Estimator

Given the low availability of data after some research it was found that using logarithmic regression would be the most addecuate given the simplicity of the task and the amount of information in the dataset.
Source: [The Best Classifier for Small Datasets: Log-F(m,m) Logit](https://medium.com/@remycanario17/log-f-m-m-logit-the-best-classification-algorithm-for-small-datasets-fc92fd95bc58)

## Dependencies
The first step is to gather all necesary dependencies, in this case we would be reading from a Google Spreadsheet and converting the information as needed.

Documentation: [Read from sheets](https://developers.google.com/sheets/api/quickstart/python)

In [None]:
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Collecting google-api-python-client
[?25l  Downloading https://files.pythonhosted.org/packages/5f/02/ae0c3aa746e2f9574727875e5110700a51f2aa1877c98b78433ad76630aa/google_api_python_client-2.2.0-py2.py3-none-any.whl (7.0MB)
[K     |████████████████████████████████| 7.0MB 9.1MB/s 
[?25hCollecting google-auth-httplib2
  Downloading https://files.pythonhosted.org/packages/ba/db/721e2f3f32339080153995d16e46edc3a7657251f167ddcb9327e632783b/google_auth_httplib2-0.1.0-py2.py3-none-any.whl
Requirement already up-to-date: google-auth-oauthlib in /usr/local/lib/python3.7/dist-packages (0.4.4)
[31mERROR: earthengine-api 0.1.260 has requirement google-api-python-client<2,>=1.12.1, but you'll have google-api-python-client 2.2.0 which is incompatible.[0m
Installing collected packages: google-auth-httplib2, google-api-python-client
  Found existing installation: google-auth-httplib2 0.0.4
    Uninstalling google-auth-httplib2-0.0.4:
      Successfully uninstalled google-auth-httplib2-0.0.4
  Found

In [30]:
import pandas as pd
import numpy as np
import gspread

from google.colab import auth
from oauth2client.client import GoogleCredentials

In [52]:
dataset_url = 'https://docs.google.com/spreadsheets/d/1z6sD5_iGArHKal-hdd2BZkHuPhRJ47Xfu3ET1D0fj0E/edit?ts=60463182#gid=1539062128' 
tab_name = 'All Builds'
critical_columns = ['Rig', 'APEX?', 'Build', 'CPU', 'GPU']
#if the dataset is moved but the structure is the same this could be pointed to other links or other tabs.

In [89]:
auth.authenticate_user() 
# This step will ask you to go into a link and approve the access to this tool. Don't do it if you don't feel confortable. You can search in internet, this is standard from Google.

gc = gspread.authorize(GoogleCredentials.get_application_default())
wb = gc.open_by_url(dataset_url) 
sheet = wb.worksheet(tab_name) 

data = sheet.get_all_values()[2:] # We skip the first two rows, no data 
orig_df = pd.DataFrame(data[1:]) # Skipping the header
orig_df.columns = data[0] # Setting up the header

orig_df = orig_df.applymap(lambda s:s.lower() if type(s) == str else s) # All to lowercase so I don't go crazy
orig_df = orig_df.applymap(lambda s:s.rstrip('?') if type(s) == str else s) # Removing ? just in case

clean_df = orig_df.replace(r'^\s*$', np.nan, regex=True).dropna(subset=critical_columns) # Converting blanks to NaN and droping rows with NaN in critical values. (see above)
clean_df['User'] = clean_df['User'].apply(hash) # Hashing Usernames :)
clean_df['GPU'] = clean_df['GPU'].map(lambda x: x.rstrip('x2')).to_list() # Removing dual 3090, residual population
###### There are records which are dropped because of NaN. !!!!!!!!!!

In [87]:
mask = clean_df['Assembled ?'] == 'TRUE'
completed_df = clean_df[mask]
not_completed_df = clean_df[~mask]

In [86]:
clean_df['User'].apply(hash)

0      1943837010158971918
1      1101663512034280585
2       784391070832263473
3      2242670421929926042
4        52300159483382933
              ...         
150   -1032415116576079081
151   -1032415116576079081
152   -1032415116576079081
153   -1849468256746663057
154    1975133446345287632
Name: User, Length: 153, dtype: int64

In [90]:
clean_df

Unnamed: 0,User,Rig,APEX?,Paint?,Build,CPU,GPU,Ordered Date,Assembled ?,Completed Date,Days
0,8360686874574768185,turbo,yes,no,custom,5900x,3080,9/15/2020,true,1/8/2021,115
1,1101663512034280585,turbo,yes,no,custom,5950x,3090,9/21/2020,true,2/8/2021,140
2,-8804388687582014717,rush,yes,no,custom,3970x,3090,9/22/2020,true,1/27/2021,127
3,4437977073109368498,turbo,yes,no,custom,5900x,3090,9/22/2020,true,2/8/2021,139
5,-4288398554463419995,rush,yes,no,custom,5950x,3090,9/25/2020,true,2/4/2021,132
...,...,...,...,...,...,...,...,...,...,...,...
150,-7949944144217160934,vybe,no,no,stage 3,5600x,3060,4/7/2021,false,,7
151,-7949944144217160934,vybe,no,no,stage 4,5900c,3080,4/8/2021,false,,6
152,-7949944144217160934,r1,yes,no,custom,5950x,6900xt,12/30/2020,false,,105
153,1375650755876427741,vybe,no,no,custom,5900x,3080,12/18/2020,false,,117
