<a href="https://colab.research.google.com/github/AryaJ3365/Investment-Prediction-Application/blob/main/Venture_Selection_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Venture Selection Model**

##**Load Data**



We use this command to give us the ability to install the anvil-uplink library into our Google Colab environment. Which gives us the ability to deploy my notebook as a web application that can be usable by anyone with access.

In [None]:
!pip install anvil-uplink

Collecting anvil-uplink
  Downloading anvil_uplink-0.4.2-py2.py3-none-any.whl (90 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/90.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.1/90.1 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting argparse (from anvil-uplink)
  Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Collecting ws4py (from anvil-uplink)
  Downloading ws4py-0.5.1.tar.gz (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.4/51.4 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ws4py
  Building wheel for ws4py (setup.py) ... [?25l[?25hdone
  Created wheel for ws4py: filename=ws4py-0.5.1-py3-none-any.whl size=45228 sha256=6697ed305d5b1b0b2ca7ad32b1422f8c5ba4501826b06e7a47bf0b7927ec48f1
  Stored in directory: /root/.cache/pip/wheels/2e/7c/ad/d9c746276b

Here we upload the anvil.server module then we connect to our server key which essentially allows us to use our application as a web app.

In [None]:
import anvil.server
anvil.server.connect("server_XXRUOSEAF3ZUIZ2AROYMMQ5L-QLE4ORLXEQ5TQIGI")

Connecting to wss://anvil.works/uplink
Anvil websocket open
Connected to "Default Environment" as SERVER


###Import various needed python packages such as the SVM Classifier, numpy, pandas, and standard scaler

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

Here we read in the CSV file that contains examples of past successful and unsuccessful investments to train the machine learning environment that I created through online research.

After there is enough WiProsper selection model data in the WiProsper CSV that will be given you can eventually comment out my initial dataset using #, so that you can then enter the new link of the WiProsper dataset which in theory should give us the highest percent of prediction accuracy.

In [None]:
# This is the inital data set link that will allow us to immediatly start predicting venture success rate.

df = pd.read_csv("https://raw.githubusercontent.com/AryaJ3365/Investment-Prediction-Application/main/Venture_Selection_Dataset.csv")

# This is where once we have enough data the WiProsper dataset belongs. Just make sure to comment out the above dataset line of code
# and uncomment out the new dataset's line of code.

#df = pd.read_csv("Enter link or file location of WIPROSPER dataset here")

##**Data Preparation**

###Analysis of the data

First we look through the head of the data set which essentially is just showing us how the first five rows of the data set look like.

Here is the key for the data set:

1.   Investment Name: The name of where the investment came from.
2.   Date: The timeframe we are looking at the investment.
3.   Successful: The percentage represented as an integer of the investment being successful. Used to help start the ML model but has no effect in the training process

KPIs (Key Performance Indicators) / Variables:
1.   Return on Investment: Used to measured the winnings or losses created by an investment.
3.   Debt-to-Equity: Indicates the amount of debt which a company is taking on. Calculated by dividing total liabilities by shareholder's equity which can be found on its balance sheet.
3.   Profit Margins: Profit margins provide insights into a company's profitability. Key metrics include gross profit margin (revenue minus cost of goods sold divided by revenue) and net profit margin (net income divided by revenue). Higher margins suggest efficient operations and pricing power.
4.   Social Media Engagement Rate: For companies with a strong social media presence, monitoring metrics like likes, shares, comments, and followers can help gauge customer engagement, brand awareness, and potential market sentiment.
4.   P/E Ratio: Used to take a company's current market capitalization and divide it by it's annual earnings.




In [None]:
df.head()

Unnamed: 0,Investment Name,Date,Return On Investment,Debt-to-Equity,Profit Margin,Social Media Engagement,P/E Ratio,Successful
0,Tesla,12/31/2009,92.42,0.15,33.25,10,56.57,100
1,Visa,12/31/2009,30.99,0.0,21.45,9,23.47,93
2,Nvidia,12/31/2009,45.63,0.01,24.63,9,30.06,97
3,Netflix,12/31/2009,25.7,0.19,22.53,8,27.82,90
4,Peloton,3/31/2023,4.05,6.21,0.34,5,0.0,40


Shows the number of rows and columns in the dataset. Denoted as: (rows, columns)




In [None]:
df.shape

(59, 8)

Gives common statistical values of numerical columns within the data set.




In [None]:
df.describe()

Unnamed: 0,Return On Investment,Debt-to-Equity,Profit Margin,Social Media Engagement,P/E Ratio,Successful
count,59.0,59.0,59.0,59.0,59.0,59.0
mean,16.761525,1.851864,12.791186,6.830508,13.519153,71.864407
std,14.733907,1.832765,7.589436,2.035483,9.951843,19.769759
min,1.03,0.0,0.14,0.0,0.0,20.0
25%,7.34,0.615,8.91,6.0,7.545,65.0
50%,12.5,1.34,12.77,7.0,12.68,75.0
75%,22.875,2.355,18.955,8.0,17.8,87.0
max,92.42,8.5,33.25,10.0,56.57,100.0


Depending on the initial success rate for each investment initial grades are given. However, the grades are used to show the application what of the results look like, the Machine Learning will then predict the testing set of the data set on it's own without knowing any of the grades to determine if the application is working accurately.




In [None]:
grades = []

for i in df['Successful']:
    if(i >= 90):
      grades.append('A')
    elif(i >= 80):
      grades.append('B')
    elif(i >= 70):
      grades.append('C')
    elif(i >= 60):
      grades.append('D')
    elif(i < 60):
      grades.append('F')

df['Grades'] = grades

The number counts of each letter grades currently in the data set.



In [None]:
df['Grades'].value_counts()

C    19
A    11
D    10
B    10
F     9
Name: Grades, dtype: int64

##**Data Preparation**

###Column Dropping

We drop the "Investment Name" and "Date" columns since technically this isn't considered numerical data that can be leveraged by the SVM classifier. We also remove the successful column since we don't want out ML environment to be able to see the actual percentages since it would give it an unfair advantage.



In [None]:
df = df.drop(columns='Investment Name')

In [None]:
df = df.drop(columns='Date')

In [None]:
df = df.drop(columns='Successful')

Here is what our new data set head will look like now only containing KPI values.

KPIs (Key Performance Indicators) / Variables:
1.   Return on Investment: Used to measured the winnings or losses created by an investment.
2.   Debt-to-Equity: Indicates the amount of debt which a company is taking on. Calculated by dividing total liabilities by shareholder's equity which can be found on its balance sheet.
3.   Profit Margins: Profit margins provide insights into a company's profitability. Key metrics include gross profit margin (revenue minus cost of goods sold divided by revenue) and net profit margin (net income divided by revenue). Higher margins suggest efficient operations and pricing power.
4.   Social Media Engagement Rate: For companies with a strong social media presence, monitoring metrics like likes, shares, comments, and followers can help gauge customer engagement, brand awareness, and potential market sentiment.
5.   P/E Ratio: Used to take a company's current market capitalization and divide it by it's annual earnings.

Note: Grades column is here currently but will be removed soon to start the data splitting.



In [None]:
df.head()

Unnamed: 0,Return On Investment,Debt-to-Equity,Profit Margin,Social Media Engagement,P/E Ratio,Grades
0,92.42,0.15,33.25,10,56.57,A
1,30.99,0.0,21.45,9,23.47,A
2,45.63,0.01,24.63,9,30.06,A
3,25.7,0.19,22.53,8,27.82,A
4,4.05,6.21,0.34,5,0.0,F


##**Data Splitting**

###Dataset Transformations

Data separation as X and y. With X being denoted as the testing set and Y being denoted as our training set.

In [None]:
X = df.drop(columns='Grades', axis = 1)
Y = df['Grades']

Next we implement the standard scaler package in order to ensure that the data will be able to fit with the SVM Classification machine learning model that we will be using in the next section.

In [None]:
scaler = StandardScaler()

In [None]:
scaler.fit(X)

Standard scaler transformation now takes place which now allows our training data set to be fed into our machine learning model

In [None]:
stdata = scaler.transform(X)

In [None]:
X = stdata

Next we create two data sets a testing data set and training data set with standard conventions that are typically used by machine learning models. Our testing data set will be 20% the size of our original data set, while our training data set will be 80% of the size of our original data set.

In [None]:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y, test_size=0.2, stratify=Y, random_state=2)

We print out the shape of the testing and training data just to verify once again that changes took place.

In [None]:
print(X.shape, X_test.shape, X_train.shape)

(59, 5) (12, 5) (47, 5)


###SVM Classification Algorithm

Here we are now going to run the SVM classifier through our code and this will allow us to now be able to test our ML environment.

**Support Vector Machine (SVM) Algorithm:** Are very helpful in determining complex relationships within our data set which is why it is a better fit than other algorithms like liner regression or logisitic regression.

In [None]:
classifier = svm.SVC(kernel='linear')

In our SVM model I chose to use a linear kernel in conjunction with our classification model in order to allow us the ability to skip the process of mapping our data onto higher dimensional space and calculate the inner product directly. Which in theory leads to a more efficient program.

In [None]:
classifier.fit(X_train, Y_train)

###Dataset Prediction Accuracy

Here we have the results of the testing and already without even a large data set we see that the model can currently predict 93% of the results correctly.

**New Dataset Note:** It is important to note that the accuracy of the model does not correlate to correct predictions, but rather how many trends and relations can be found within the data. So if your accuracy of a new dataset is low that is okay. The main thing is even with low accuracy that the new trained model is making decisions that seem logical and understandable after you input the data. Over time, however, the accuracy should go up depending on your sample size, so this metric should not be used to grade the model.

In [None]:
train_pred = classifier.predict(X_train)
accuracy = accuracy_score(train_pred, Y_train)

Training dataset score accuracy:

In [None]:
print("Accuracy score of train data = {}".format(accuracy))

Accuracy score of train data = 0.9361702127659575


Testing dataset score accuracy:

In [None]:
test_pred = classifier.predict(X_test)
accuracy2 = accuracy_score(test_pred, Y_test)

In [None]:
print('Accuracy score on test data = {}'.format(accuracy2))

Accuracy score on test data = 0.9166666666666666


##**Investment Prediction**

###Dataset Information and Rules

Here is the data set head for reference.

KPIs (Key Performance Indicators) / Variables:
1.   Return on Investment: Used to measured the winnings or losses created by an investment.
2.   Debt-to-Equity: Indicates the amount of debt which a company is taking on. Calculated by dividing total liabilities by shareholder's equity which can be found on its balance sheet.
3.   Profit Margins: Profit margins provide insights into a company's profitability. Key metrics include gross profit margin (revenue minus cost of goods sold divided by revenue) and net profit margin (net income divided by revenue). Higher margins suggest efficient operations and pricing power.
4.   Social Media Engagement Rate: For companies with a strong social media presence, monitoring metrics like likes, shares, comments, and followers can help gauge customer engagement, brand awareness, and potential market sentiment.
5.   P/E Ratio: Used to take a company's current market capitalization and divide it by it's annual earnings.

Note: Grades column will not be seen by the testing set and is only shown now for reference as to what the prediction features will look like.



In [None]:
df.head(10)

Unnamed: 0,Return On Investment,Debt-to-Equity,Profit Margin,Social Media Engagement,P/E Ratio,Grades
0,92.42,0.15,33.25,10,56.57,A
1,30.99,0.0,21.45,9,23.47,A
2,45.63,0.01,24.63,9,30.06,A
3,25.7,0.19,22.53,8,27.82,A
4,4.05,6.21,0.34,5,0.0,F
5,3.4,5.03,1.49,6,6.57,F
6,2.5,6.47,0.14,2,0.0,F
7,1.03,8.5,0.79,2,0.18,F
8,25.11,0.07,20.43,8,21.59,A
9,3.47,5.52,0.58,4,0.0,F


###Data Entry

Here is where we will go ahead and enter the values of the investment you want to predict. For now go ahead and in the Fill Yourself Section in the code below edit the corresponding variables to match the investment you want to predict.



We will then be given output of the score of our investment. Here is what each output that is possible means.

**Letter Grade Scale:**

**A**: (90% - 100%)

**B**: (80% - 90%)

**C**: (70% - 80%)

**D**: (60% - 70%)

**F**: (<60%)


This is where we do our data prediction for an investment, however, if you are planning to use the web application for startup prediction then feel free to ignore this entire block of code completely.



In [None]:
#Enter Values of the Startup you want the ML Model to predict:
#***FOR YOU TO FILL OUT***
InvestmentName = "AMD"
DateOfInvestment = "10/31/2009"
ROI = 6.23
DebtToEquityRatio = -32.4
ProfitMargin = 42.05
SocialMediaEngagement = 5
PE_Ratio = 21.51
#*************************

#input = (FreeCashFlow, ROI, DebtToEquityRatio, PE_Ratio)
input = (ROI, DebtToEquityRatio, ProfitMargin, SocialMediaEngagement, PE_Ratio)

data_changed = np.asarray(input)

data_reshaped = data_changed.reshape(1,-1)

std_data = scaler.transform(data_reshaped)
print(std_data)

prediction = classifier.predict(std_data)
print("output grade: ", prediction)

[[ -0.72091714 -18.84904425   3.88829535  -0.90701876   0.80984396]]
output grade:  ['A']




###Anvil Web Application Date Entry Function

This is the investment_predict function used in the Anvil web app to allow users to predict startup investments through the web app rather than the notebook.


In [None]:
@anvil.server.callable
def investment_predict(investment_name, date_of_investment, free_cash_flow, debt_to_equity_ratio, pe_ratio, roi):

  #Variables that will be used in determining the grade of the prediction.
  InvestmentName = investment_name
  DateOfInvestment = date_of_investment
  FreeCashFlow = free_cash_flow
  DebtToEquityRatio = debt_to_equity_ratio
  PE_Ratio = pe_ratio
  ROI = roi

  # Looking primarily at the KPI columns
  input = (FreeCashFlow, DebtToEquityRatio, PE_Ratio, ROI)

  # Reshape the data to include this new input while also ensuring that all the columns
  # are valid for prediction using the standard scaler transformation.
  data_changed = np.asarray(input)
  data_reshaped = data_changed.reshape(1,-1)
  std_data = scaler.transform(data_reshaped)
  #print(std_data)

  # Predicting the grade of the added investment using the SVM classifier.
  prediction = classifier.predict(std_data)

  return prediction

This function is used to allow us to keep our notebook running in the background which will allow the Anvil app to continuously call functions.


In [None]:
anvil.server.wait_forever()

KeyboardInterrupt: ignored