<a href="https://colab.research.google.com/github/AryaJ3365/Investment-Prediction-Application/blob/main/Venture_Selection_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Venture Selection Model**

##**Load Data**



We use this command to give us the ability to install the anvil-uplink library into our Google Colab environment. Which gives us the ability to deploy my notebook as a web application that can be usable by anyone with access.

In [1]:
!pip install anvil-uplink

Collecting anvil-uplink
  Downloading anvil_uplink-0.4.2-py2.py3-none-any.whl (90 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/90.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.1/90.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting argparse (from anvil-uplink)
  Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Collecting ws4py (from anvil-uplink)
  Downloading ws4py-0.5.1.tar.gz (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.4/51.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ws4py
  Building wheel for ws4py (setup.py) ... [?25l[?25hdone
  Created wheel for ws4py: filename=ws4py-0.5.1-py3-none-any.whl size=45228 sha256=a82aa52c262ce260c7d26934f3542d2a44abbc12ffa70ef8b27fb6de42741e3f
  Stored in directory: /root/.cache/pip/wheels/2e/7c/ad/d9c746276b

Here we upload the anvil.server module then we connect to our server key which essentially allows us to use our application as a web app.

In [2]:
import anvil.server
anvil.server.connect("server_XXRUOSEAF3ZUIZ2AROYMMQ5L-QLE4ORLXEQ5TQIGI")

Connecting to wss://anvil.works/uplink
Anvil websocket open
Connected to "Default Environment" as SERVER


###Import various needed python packages such as the SVM Classifier, numpy, pandas, and standard scaler

In [3]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

Here we read in the CSV file that contains examples of past successful and unsuccessful investments to train the machine learning environment that I created through online research. Note: In the future I hope to create using startup investment data gathered fully through WiProsper.

In [4]:
df = pd.read_csv("https://raw.githubusercontent.com/AryaJ3365/Investment-Prediction-Application/main/Regression3.csv")

##**Data Preparation**

###Analysis of the data

First we look through the head of the data set which essentially is just showing us how the first five rows of the data set look like.

Here is the key for the data set:

1.   Investment Name: The name of where the investment came from.
2.   Date: The timeframe we are looking at the investment.
3.   Successful: The percentage represented as an integer of the investment being successful. Used to help start the ML model but has no effect in the training process

KPIs (Key Performance Indicators):
1.   Free Cash Flow: Metric which tells how much money a company is actually generating.
2.   Return on Investment: Used to measured the winnings or losses created by an investment.
3.   Debt-to-Equity: Indicates the amount of debt which a company is taking on. Calculated by dividing total liabilities by shareholder's equity which can be found on its balance sheet.
4.   P/E Ratio: Used to take a company's current market capitalization and divide it by it's annual earnings.




In [5]:
df.head()

Unnamed: 0,Investment Name,Date,Free Cash Flow,Return On Investment,Debt-to-Equity,P/E Ratio,Successful
0,Tesla,12/31/2009,-92.71,92.42,-0.15,56.57,100
1,Visa,12/31/2009,252.0,30.99,0.0,23.47,93
2,Nvidia,12/31/2009,410.21,25.63,0.01,30.06,97
3,Netflix,12/31/2009,279.13,31.7,1.19,27.82,96
4,Peloton,3/31/2023,-2373.3,-119.05,-13.21,0.0,40


Shows the number of rows and columns in the dataset. Denoted as: (rows, columns)




In [6]:
df.shape

(42, 7)

Gives common statistical values of numerical columns within the data set.




In [7]:
df.describe()

Unnamed: 0,Free Cash Flow,Return On Investment,Debt-to-Equity,P/E Ratio,Successful
count,42.0,42.0,42.0,42.0,42.0
mean,1513.397143,1.242619,-0.695238,13.772619,70.047619
std,4258.461688,52.402249,5.396165,11.441841,22.100604
min,-6000.0,-262.5,-22.03,0.0,20.0
25%,26.915,1.985,0.035,6.925,63.75
50%,363.385,7.25,0.545,12.06,74.5
75%,809.4325,18.36,1.62,18.795,87.0
max,17656.0,92.42,3.73,56.57,100.0


Depending on the initial success rate for each investment initial grades are given. However, the grades are used to show the application what of the results look like, the Machine Learning will then predict the testing set of the data set on it's own without knowing any of the grades to determine if the application is working accurately.




In [8]:
grades = []

for i in df['Successful']:
    if(i >= 90):
      grades.append('A')
    elif(i >= 80):
      grades.append('B')
    elif(i >= 70):
      grades.append('C')
    elif(i >= 60):
      grades.append('D')
    elif(i < 60):
      grades.append('F')

df['Grades'] = grades

The number counts of each letter grades currently in the data set.



In [9]:
df['Grades'].value_counts()

C    11
A     9
F     9
D     7
B     6
Name: Grades, dtype: int64

##**Data Preparation**

###Column Dropping

We drop the "Investment Name" and "Date" columns since technically this isn't considered numerical data that can be leveraged by the SVM classifier. We also remove the successful column since we don't want out ML environment to be able to see the actual percentages since it would give it an unfair advantage.



In [10]:
df = df.drop(columns='Investment Name')

In [11]:
df = df.drop(columns='Date')

In [12]:
df = df.drop(columns='Successful')

Here is what our new data set head will look like now only containing KPI values.

KPIs (Key Performance Indicators):
1.   Free Cash Flow: Metric which tells how much money a company is actually generating.
2.   Return on Investment: Used to measured the winnings or losses created by an investment.
3.   Debt-to-Equity: Indicates the amount of debt which a company is taking on. Calculated by dividing total liabilities by shareholder's equity which can be found on its balance sheet.
4.   P/E Ratio: Used to take a company's current market capitalization and divide it by it's annual earnings.

Note: Grades column is here currently but will be removed soon to start the data splitting.



In [13]:
df.head()

Unnamed: 0,Free Cash Flow,Return On Investment,Debt-to-Equity,P/E Ratio,Grades
0,-92.71,92.42,-0.15,56.57,A
1,252.0,30.99,0.0,23.47,A
2,410.21,25.63,0.01,30.06,A
3,279.13,31.7,1.19,27.82,A
4,-2373.3,-119.05,-13.21,0.0,F


##**Data Splitting**

###Dataset Transformations

Data separation as X and y. With X being denoted as the testing set and Y being denoted as our training set.

In [14]:
X = df.drop(columns='Grades', axis = 1)
Y = df['Grades']

Next we implement the standard scaler package in order to ensure that the data will be able to fit with the SVM Classification machine learning model that we will be using in the next section.

In [15]:
scaler = StandardScaler()

In [16]:
scaler.fit(X)

Standard scaler transformation now takes place which now allows our training data set to be fed into our machine learning model

In [17]:
stdata = scaler.transform(X)

In [18]:
X = stdata

Next we create two data sets a testing data set and training data set with standard conventions that are typically used by machine learning models. Our testing data set will be 20% the size of our original data set, while our training data set will be 80% of the size of our original data set.

In [19]:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.2, stratify=Y, random_state=2)

We print out the shape of the testing and training data just to verify once again that changes took place.

In [20]:
print(X.shape, X_test.shape, X_train.shape)

(42, 4) (9, 4) (33, 4)


###SVM Classification Algorithm

Here we are now going to run the SVM classifier through our code and this will allow us to now be able to test our ML environment.

**Support Vector Machine (SVM) Algorithm:** Are very helpful in determining complex relationships within our data set which is why it is a better fit than other algorithms like liner regression or logisitic regression.

In [21]:
classifier = svm.SVC(kernel='linear')

In our SVM model I chose to use a linear kernel in conjunction with our classification model in order to allow us the ability to skip the process of mapping our data onto higher dimensional space and calculate the inner product directly. Which in theory leads to a more efficient program.

In [22]:
classifier.fit(X_train, Y_train)

###Dataset Prediction Accuracy

Here we have the results of the testing and already without even a large data set we see that the model can already predict 80% of the results correctly. Which is very impressive as currently we only have a data set with 40 entries which typically yields scores between 40%-60%.

In [23]:
train_pred = classifier.predict(X_train)
accuracy = accuracy_score(train_pred, Y_train)

Training dataset score accuracy:

In [24]:
print("Accuracy score of train data = {}".format(accuracy))

Accuracy score of train data = 0.7878787878787878


Testing dataset score accuracy:

In [25]:
test_pred = classifier.predict(X_test)
accuracy2 = accuracy_score(test_pred, Y_test)

In [26]:
print('Accuracy score on test data = {}'.format(accuracy2))

Accuracy score on test data = 0.7777777777777778


##**Investment Prediction**

###Dataset Information and Rules

Here is the data set head for reference.

KPIs (Key Performance Indicators):
1.   Free Cash Flow: Metric which tells how much money a company is actually generating.
2.   Return on Investment: Used to measured the winnings or losses created by an investment.
3.   Debt-to-Equity: Indicates the amount of debt which a company is taking on. Calculated by dividing total liabilities by shareholder's equity which can be found on its balance sheet.
4.   P/E Ratio: Used to take a company's current market capitalization and divide it by it's annual earnings.

Note: Grades column will not be seen by the testing set and is only shown now for reference as to what the prediction features will look like.



In [27]:
df.head(10)

Unnamed: 0,Free Cash Flow,Return On Investment,Debt-to-Equity,P/E Ratio,Grades
0,-92.71,92.42,-0.15,56.57,A
1,252.0,30.99,0.0,23.47,A
2,410.21,25.63,0.01,30.06,A
3,279.13,31.7,1.19,27.82,A
4,-2373.3,-119.05,-13.21,0.0,F
5,475.0,-3.4,-22.03,6.57,D
6,-4.09,-262.5,-2.47,0.0,F
7,-6000.0,-48.93,-4.5,0.18,F
8,0.1,39.61,2.85,4.28,F
9,5615.0,3.11,3.65,21.59,A


###Data Entry

Here is where we will go ahead and enter the values of the investment you want to predict. For now go ahead and in the Fill Yourself Section in the code below edit the corresponding variables to match the investment you want to predict.



We will then be given output of the score of our investment. Here is what each output that is possible means.

**Letter Grade Scale:**

**A**: (90% - 100%)

**B**: (80% - 90%)

**C**: (70% - 80%)

**D**: (60% - 70%)

**F**: (<60%)


This is where we do our data prediction for an investment, however, if you are planning to use the web application for startup prediction then feel free to ignore this entire block of code completely.



In [28]:
#Enter Values of the Startup you want the ML Model to predict:
#***FOR YOU TO FILL OUT***
InvestmentName = "WiProsper"
DateOfInvestment = "6/16/2023"
FreeCashFlow = 140.50
DebtToEquityRatio = 2.1
PE_Ratio = 9.6
ROI = 40
#*************************

#input = (FreeCashFlow, DebtToEquityRatio, PE_Ratio, ROI)

#data_changed = np.asarray(input)

#data_reshaped = data_changed.reshape(1,-1)

#std_data = scaler.transform(data_reshaped)
#print(std_data)

#prediction = classifier.predict(std_data)
#print(prediction)

###Anvil Web Application Date Entry Function

This is the investment_predict function used in the Anvil web app to allow users to predict startup investments through the web app rather than the notebook.


In [29]:
@anvil.server.callable
def investment_predict(investment_name, date_of_investment, free_cash_flow, debt_to_equity_ratio, pe_ratio, roi):

  #Variables that will be used in determining the grade of the prediction.
  InvestmentName = investment_name
  DateOfInvestment = date_of_investment
  FreeCashFlow = free_cash_flow
  DebtToEquityRatio = debt_to_equity_ratio
  PE_Ratio = pe_ratio
  ROI = roi

  # Looking primarily at the KPI columns
  input = (FreeCashFlow, DebtToEquityRatio, PE_Ratio, ROI)

  # Reshape the data to include this new input while also ensuring that all the columns
  # are valid for prediction using the standard scaler transformation.
  data_changed = np.asarray(input)
  data_reshaped = data_changed.reshape(1,-1)
  std_data = scaler.transform(data_reshaped)
  #print(std_data)

  # Predicting the grade of the added investment using the SVM classifier.
  prediction = classifier.predict(std_data)

  return prediction

This function is used to allow us to keep our notebook running in the background which will allow the Anvil app to continuously call functions.


In [30]:
anvil.server.wait_forever()

KeyboardInterrupt: ignored