<a href="https://www.kaggle.com/kamaljp/money-value-flows?scriptVersionId=86680666" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Purpose of the Notebook

Story of the Countries and their Companies, those 2000 top players who have captured not only the sales, and profits but also the hearts of their customers. The rendering starts with the overview of the countries, and how their assets, market values were allocated for getting the profits. 

# What to expect
Can we predict the profit of the companies with their Sales, Assets, and MV data? This notebook tries to answer the same

# How does it answer
Series of [visuals](#go_up), that talk about the coutries and companies. Simple code, with comments explains the reason when something complex is done. Following that notebook discusses about the [machine learning models](#Next-Step), and uses a proven method to train the model. Then visualisation of the results are carried out then [real world data](#real) is entered in the chosen model and 

# A sneek Peek
Small feature engineering activity is done, to bring more insights to the table. Surprising countries like [Bangladesh, Nigeria](#vis-4) come to the top place when seen under certain features. The companies by themselves come alive, as their sales and market values are rendered in [scatter plot](#vis-7).... 
How about the classification of companies based on their assets? Or based on [engineered features?](#vis-13) New possibilities

<a id='go_up'>PS: Purpose, What to Expect and A sneek Peek are hidden in above cell, unhide to see the same...</a>

1. [Visual_1: Country wise Sales and Number of Companies](#vis-1)
2. [Visual-2: Assets Vs Profits of the Companies based on Countries](#vis-2)
3. [Visual 3: Profits made per Dollar of Market value ](#vis-3)
4. [Visual_4: Profits made per Dollar of Asset value ](#vis-4)
5. [Visual_5: Profits made per Company in Country](#vis-5)
6. [Visual_6: Visualising Global Profit Distribution](#vis-6)
7. [Visual_7: Scatterplot of Company profits with respect to Countries](#vis-7)
8. [Visual_8: Countries and their Profitable Companies](#vis-8)
9. [Visual_9: More Impactful rendering of Visual 8](#vis-9)
10. [Visual_10: Do a companies Finances Lead to Profits?](#vis-10)
11. [Visual_11: Does Sales lead to Profitablity? OMG No....](#vis-11)
12. [Visual_12: Who and Where?....United States Companies](#vis-12)
13. [Visual_13: Can engineered features predict profit?](#vis-13)
14. [Visual_14: Does any kind of Correlation exist?](#vis-14)

After analysing the data by slicing and dicing the data, and looking at the results... Sales and Market Value seems to be [correlated with the profit](#vis-14) Human ability has been augumented to some extent with the charts, and visuals of the plotly charts. Can this be trumped by the learning capabilities of Machine Learning Algorithms. 

Next part of the Notebook will enter the world of Random forests, Ensembles and Cross Folds. New way to look at the data.Ready... Click [here](#Next-Step)

15. [Can the Machine Learning Algorithms do better???](#Next-Step)
    
    a) [Preping data](#prep)
    
    b) [Feature Selection](#feat_sel)
    
    c) [Train n Test Dataset](#data_split)
    
    d) [Cross Validation Folding n Metrics](#eval_metric)
    
    e) [Instantiating Models](#mod_ins)
    
    f) [Visualising Results metrics](#resul)
    
    g) [Conclusion!!! Can we learn something more???](#Conclusion)

16. [Real World Predictions with the model created](#real)

In [None]:
!pip install sidetable
import os
import numpy as np 
import pandas as pd 
import seaborn as sns
import plotly.express as px
from plotly.offline import init_notebook_mode
from plotly.subplots import make_subplots
import plotly.graph_objects as go
init_notebook_mode(connected=True)
pd.set_option('display.max_columns', 5000)
import warnings
warnings.filterwarnings("ignore")
#os.mkdir('/kaggle/working/individual_charts/')
import matplotlib.pyplot as plt
# Load the data
#Will come in handy to wrap the lengthy texts
import textwrap
#useful libraries and functions
import sidetable as stb
from itertools import repeat
#Libraries that give a different visual possibilities
from pandas import option_context 
from plotly.subplots import make_subplots

def long_sentences_seperate(sentence, width=30):
    try:
        splittext = textwrap.wrap(sentence,width)
        text = '<br>'.join(splittext)#whitespace is removed, and the sentence is joined
        return text
    except:
        return sentence

def load_csv(base_dir,file_name):
    """Loads a CSV file into a Pandas DataFrame"""
    file_path = os.path.join(base_dir,file_name)
    df = pd.read_csv(file_path,low_memory=False)
    return df    

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
#Supporting Functions that are used at various locations in the notebook

#Function to reduce the names to just abbreviations
def shrnk_name(company):
    lngt = company.split(' ')
    temp = str()
    if len(lngt) > 1:
        for x in lngt:
            temp = temp + x[0]
        return temp
    else:
        return company

#Function that converts the strings that needs to be numbers. This function grew, as I started finding issues
#Issues like finding "M","," and "." in the sales, profit, MV and assets values 
def convert_str(x):
    temp = x[1:]
    if 'M' in temp: #Checking if the value contain 'M' signifying Millions
        #Inner condition to check other characters
        if ',' in temp: #checking if there is ',' in the string
            temp = temp.replace(',','')
            if '.'in temp:
                return float(temp[:-2])/1000 #converting to Billions
            else:
                return int(temp[:-2])/1000 #converting to Billions

        else:
            return float(temp[:-2])/1000 #converting to Billions
    else:
        #repeating the above inner conditions, without dividing
        if ',' in temp: #checking if there is ',' in the string
            temp = temp.replace(',','')
            if '.'in temp:
                return float(temp[:-2])
            else:
                return int(temp[:-2])

        else:
            return float(temp[:-2])

In [None]:
base_dir = '../input/fortune-global-2000-companies-till-2021'
file_name = 'fortune_2000_in_2021.csv'
fortune = load_csv(base_dir=base_dir,file_name=file_name)

In [None]:
#Converting the columns with values in string format to integer format
fortune.loc[:,'Sales'] = fortune.Sales.apply(lambda x: convert_str(x))
fortune.loc[:,'Profit'] = fortune.Profit.apply(lambda x: convert_str(x))
fortune['MV'] = fortune['Market Value'].apply(lambda x: convert_str(x))

In [None]:
#The NaN in the Assets leads to "not subscriptable error", leading to following process.
fortune.loc[fortune.Assets.isna(),'Assets'] = 0.0
fortune.loc[(fortune.Assets != 0),'Assets'] = fortune.loc[(fortune.Assets != 0),'Assets'].apply(lambda x: convert_str(x))
fortune.drop('Market Value',axis=1,inplace=True)
fortune.loc[:,'Assets'] = fortune.loc[:,'Assets'].astype('float')

In [None]:
#Creating a groupby transformation to get the aggregate values per country
comp = fortune.groupby('Country').agg({'Name':'count','Sales':'sum','Profit':'sum','Assets':'sum','MV':'sum'}).reset_index()
comp.sort_values(by='Name',ascending=False,inplace=True)

# <a id='vis-1'> Visual_1: Country wise Sales and Number of Companies </a>

In [None]:
visual_1 = make_subplots(specs=[[{"secondary_y": True}]]) #Creating Subplots to bring 2nd axis

visual_1.add_trace(go.Line(x=comp.Country,y=comp.Name,name='Number of Companies'),secondary_y=False)
visual_1.add_trace(go.Bar(x=comp.Country,y=comp.Sales,name='Sales_Country'),secondary_y=True)

visual_1.update_xaxes(title_text="Country")

# Set y-axes titles
visual_1.update_yaxes(title_text="<b>Counts per Country</b>", secondary_y=False)
visual_1.update_yaxes(title_text="<b>Sales per Country</b>", secondary_y=True)

visual_1.update_layout(legend=dict(x=0,y=-0.1,traceorder="reversed",orientation="h"),
                       title='Sales_Country',width=1000,height=1000)


visual_1.show()

In [None]:
#Lets cook some Features to better create stories, and start answering interesting questions
#PoA : Profits made per $ of Asset value 
#PoM : Profits made per $ of Market value 
#PoA : Average Profit 
comp.loc[:,'PoA'] = round(comp.Profit/comp.Assets,2)
comp.loc[:,'PoM'] = round(comp.Profit/comp.MV,2)
comp.loc[:,'PAv'] = round(comp.Profit/comp.Name,2)

# <a id='vis-2'> Visual_2: Assets Vs Profits of the Companies based on Countries </a>

In [None]:
visual_2 = go.Figure()
visual_2.add_trace(go.Bar(y=comp.Country,x=comp.Assets,name='Assets_Country',orientation='h'))
visual_2.add_trace(go.Bar(y=comp.Country,x=comp.Profit,name='Profit_Country',orientation='h'))
visual_2.add_trace(go.Bar(y=comp.Country,x=comp.Sales,name='Sales',orientation='h',))

# Set axes titles
visual_2.update_yaxes(title_text="Country")
visual_2.update_xaxes(title_text="<b>Assets & Profits per Country</b>")

visual_2.update_layout(legend=dict(x=0,y=-0.1,traceorder="reversed",orientation="h"),
                       title='Assets Vs Profit',width=1000,height=1000)

visual_2.show()

# <a id='vis-3'> Visual_3: Profits made per $ of Market value </a>

In [None]:
visual_3 = go.Figure()
visual_3.add_trace(go.Bar(y=comp.Country,x=comp.PoM,name='Profit Per MV$',orientation='h'))

# Set axes titles
visual_3.update_yaxes(title_text="Country")
visual_3.update_xaxes(title_text="<b>Profits per MV in a Country</b>")

visual_3.update_layout(legend=dict(x=0,y=-0.1,traceorder="reversed",orientation="h"),
                       title='Profits Vs MV',width=1000,height=1000,
                       yaxis = {'categoryorder':'total ascending'})

visual_3.show()

# <a id='vis-4'> Visual_4: Profits made per $ of Asset value </a>

In [None]:
visual_4 = go.Figure()
visual_4.add_trace(go.Bar(y=comp.Country,x=comp.PoA,name='Profit Per Assets $',orientation='h'))

# Set axes titles
visual_4.update_yaxes(title_text="Country")
visual_4.update_xaxes(title_text="<b>Profits per Assets in a Country</b>")

visual_4.update_layout(legend=dict(x=0,y=-0.1,traceorder="reversed",orientation="h"),
                       title='Profits Vs Assets',width=1000,height=1000,
                       yaxis = {'categoryorder':'total ascending'})

visual_4.show()

# <a id='vis-5'> Visual_5: Profits made per Company in Country </a>

In [None]:
visual_5 = go.Figure()
visual_5.add_trace(go.Bar(y=comp.Country,x=comp.PAv,name='Profit Per Company',orientation='h'))

# Set axes titles
visual_5.update_yaxes(title_text="Country")
visual_5.update_xaxes(title_text="<b>Profits per Company</b>")

visual_5.update_layout(legend=dict(x=0,y=-0.1,traceorder="reversed",orientation="h"),
                       title='Profits per Company',width=1000,height=1000,
                       yaxis = {'categoryorder':'total ascending'})

visual_5.show()

In [None]:
#Melting the dataframe to long format, makes it easier to visualise three values in one choropleth chart
comp_melt = pd.melt(comp,id_vars=['Country'],value_vars=['Sales','MV','Assets','Profit'])

# <a id='vis-6'> Visual_6: Visualising Global Profit Distribution </a>

In [None]:
#There is advantage of looking at the global map to see the defining factors
visual_6 = px.choropleth(comp_melt, locations="Country",color='value',
                         locationmode='country names',facet_row='variable')
visual_6.update_layout(margin={"r":0,"t":0,"l":0,"b":0},width=700,height=1000,
                       title = 'Distribution of Sales, Profit & MV globally')
visual_6.show()

We have been seeing the high level picture of the companies, and countries. How about drilling down at the company level.The number of companies in the countries differ in order of magnitude.

# <a id='vis-7'> Visual_7: Scatterplot of Company profits with respect to Countries </a>

In [None]:
fortune['Name_Abbrv'] = fortune.Name.apply(lambda x: shrnk_name(x))
fortune['PoM'] = fortune.Profit/fortune.MV
visual_7 = px.scatter(data_frame=fortune,y='Name_Abbrv',x='Sales',
                      animation_frame='Country',color='PoM')
visual_7.update_xaxes(type='log')#To make the location of individual company more visible
visual_7.update_layout(title = 'Distribution of Profit by companies',height=1000)
visual_7.show()

# <a id='vis-8'> Visual_8: Countries and their Profitable Companies </a>

In [None]:
fortune['Profitable'] = fortune.Profit.apply(lambda x : 'In_Profit' if x > 0 else 'In_Loss')
visual_grp_8 = fortune.groupby(['Profitable','Country'])['Name_Abbrv'].count().reset_index()
visual_8 = go.Figure()
visual_8.add_trace(go.Bar(y=visual_grp_8.loc[visual_grp_8.Profitable == 'In_Loss','Country'],
                          x=visual_grp_8.loc[visual_grp_8.Profitable == 'In_Loss','Name_Abbrv'],
                          name='In Loss',orientation='h'))
visual_8.add_trace(go.Bar(y=visual_grp_8.loc[visual_grp_8.Profitable == 'In_Profit','Country'],
                          x=visual_grp_8.loc[visual_grp_8.Profitable == 'In_Profit','Name_Abbrv'],
                          name='In Profit',orientation='h'))
visual_8.update_layout(barmode='stack',height = 1000,title='Countries & Their Companies')

# <a id='vis-9'> Visual_9: More Impactful rendering of Visual 8 </a>

In [None]:
visual_9 = px.choropleth(visual_grp_8, locations="Country",color='Name_Abbrv',
                         locationmode='country names',facet_row='Profitable',
                         title='Impactful view')
visual_9.update_layout(margin={"r":0,"t":0,"l":0,"b":0},width = 1000)
visual_9.show()

# <a id='vis-10'> Visual_10: Do a companies Finances Lead to Profits? </a>

In [None]:
visual_10 = go.Figure()

visual_10.add_trace(go.Scatter(x=fortune.Assets,y=fortune.Profit,name='Profit & Assets',mode='markers'))
visual_10.add_trace(go.Scatter(x=fortune.MV,y=fortune.Profit,name='Profit & MarketValue',mode='markers'))
visual_10.update_xaxes(title='Assets & Market value in B USD')
visual_10.update_yaxes(title='Profit in Billion USD')
visual_10.update_layout(title='Scatter plot of Companies', height = 1000, width = 1000)
visual_10.show()

# <a id='vis-11'> Visual_11: Does Sales lead to Profitablity? OMG No.... </a> 

In [None]:
visual_11 = go.Figure()

visual_11.add_trace(go.Scatter(x=fortune.Sales,y=fortune.Profit,name='Profit & Sales',
                               mode='markers',text=fortune.Name_Abbrv,textposition='top center'))
visual_11.update_xaxes(title='Sales in B USD')
visual_11.update_yaxes(title='Profit in Billion USD')
#Adding annotations
visual_11.add_annotation(text="Companies not Profitable with Sales in Billion",xref="paper", yref="paper",
                         x=0.1, y=0.3, showarrow=False,
                         font=dict(family="Courier New, monospace,bold",
                                   size=24,color="#ff000f"))

visual_11.update_layout(title='Scatter plot of Sales & Profit', height = 1000, width = 1000)
visual_11.show()

# <a id='vis-12'> Visual_12: Who and Where?....United States Companies </a> 

In [None]:
#Sales more than a 100 Billion USD and Loss more than -20 Billion USD
visual_grp_12 = fortune[(fortune.Profit < -2) & (fortune.Sales > 5)]

visual_12 = px.scatter(data_frame=visual_grp_12,x='Sales',y='Profit',
                       color='Country',text='Name_Abbrv',size='Sales',size_max=60)

#visual_12.add_trace(go.Scatter(x=visual_grp_12.Sales,y=visual_grp_12.Profit,name='Profit & Sales',
 #                              mode='markers+text',text=visual_grp_12.Name_Abbrv,textposition='top center'))
visual_12.update_xaxes(title='Sales in B USD')
visual_12.update_yaxes(title='Profit in Billion USD')
#Adding text position.
visual_12.update_traces(textposition='top center')
#Adding annotations
visual_12.add_annotation(text="Who and Where",xref="paper", yref="paper",
                         x=0.1, y=0.3, showarrow=False,
                         font=dict(family="Courier New, monospace,bold",
                                   size=24,color="#ff000f"))

visual_12.update_layout(height = 1000, width = 1000)
visual_12.show()

# <a id='vis-13'> Visual_13: Can engineered features predict profit? </a> 

In [None]:
#Engineering features for the main dataframe
fortune['PoS'] = fortune.Profit/fortune.Sales
fortune.loc[(fortune.Assets != 0),'PoA'] = fortune.loc[(fortune.Assets != 0),'Profit']/fortune.loc[(fortune.Assets != 0),'Assets']
fortune.loc[(fortune.Assets == 0),'PoA'] = 0.0
#The Assets and POA columns remains as object dtype, so below type change to be done..
fortune.loc[:,'PoA'] = fortune.loc[:,'PoA'].astype('float')

In [None]:
visual_13 = make_subplots(rows=3,cols=1)

visual_13.add_trace(go.Scatter(x=fortune.PoA,y=fortune.Profit,name='Profit Vs PoA',
                               mode='markers',text=fortune.Name_Abbrv,textposition='top center'),row=1,col=1)
visual_13.add_trace(go.Scatter(x=fortune.PoM,y=fortune.Profit,name='Profit Vs PoM',
                               mode='markers',text=fortune.Name_Abbrv,textposition='top center'),row=2,col=1)
visual_13.add_trace(go.Scatter(x=fortune.PoS,y=fortune.Profit,name='Profit Vs PoS',
                               mode='markers',text=fortune.Name_Abbrv,textposition='top center'),row=3,col=1)
visual_13.update_xaxes(title='PoA : Profit Over Assets')
visual_13.update_yaxes(title='Profit in Billion USD')
#Adding annotations

visual_13.update_layout(title='Scatter plot of Sales & Profit', height = 1000, width = 1000)
visual_13.show()

# <a id='vis-14'> Visual_14: Does any kind of Correlation exist? </a> 

In [None]:
visual_corr = fortune[['Name_Abbrv','Sales','Profit', 'Assets', 'MV','PoM','PoS','PoA','Profitable']]
corr_mat = visual_corr.corr(min_periods=1,method='pearson')
visual_14 = px.imshow(corr_mat, text_auto=True)
visual_14.show()

# <a id='Next-Step'> Can the Machine Learning Algorithms do better??? </a> 

Our eyes and visualisation methods have revealed some of the dependencies of a companies profit. Can the machine learning algorithms learn

In [None]:
#New set of Libraries that pull the Machine learning algorithms will be imported
#Library to scale the values to equal range, to reduced confusion to algorithms
from sklearn.preprocessing import StandardScaler

#Spliting the data for learning and searching for new insights using these libraries
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

#Engines of Machine learning
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.neural_network import MLPRegressor

#Libraries for Deep Learning Models
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD #Have to tensorflow has main library
from keras.layers import LSTM
from keras.wrappers.scikit_learn import KerasRegressor

# Error Metrics to do cross_validation
from sklearn.metrics import mean_squared_error

# Feature Selection
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2, f_regression

#Plotting 
from pandas.plotting import scatter_matrix


### <a id='prep'> Preping data </a>

In [None]:
dataset = fortune[['Sales','Profit','Assets','MV','PoM','PoS','PoA']]
dataset.describe()

In [None]:
X = dataset[['Sales','Assets','MV','PoM','PoS','PoA']] #Independent variables
Y = dataset['Profit'] #Predicted variables

### <a id='feat_sel'> Feature Selection </a>

In [None]:
bestfeatures = SelectKBest(k=5, score_func=f_regression)
fit = bestfeatures.fit(X,Y) #Fitting on the data
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)

#concat two dataframes for better visualization 
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Specs','Score']  #naming the dataframe columns
featureScores.nlargest(10,'Score').set_index('Specs')  #print best features

### <a id='data_split'> Train n Test Dataset </a>

In [None]:
validation_size = 0.2

#In case the data is not dependent on the time series, then train and test split randomly
# seed = 7
# X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=validation_size, random_state=seed)
train_size = int(len(X) * (1-validation_size))
X_train, X_test = X[0:train_size], X[train_size:len(X)]
Y_train, Y_test = Y[0:train_size], Y[train_size:len(X)]

### <a id='eval_metric'> Cross Validation Folding n Metrics </a>

In [None]:
num_folds = 10
seed = 7
# scikit is moving away from mean_squared_error. 
# In order to avoid confusion, and to allow comparison with other models, we invert the final scores
scoring = 'neg_mean_squared_error' 

### <a id='mod_ins'> Instantiating Models </a>

In [None]:
##### Regression and Tree Regression algorithms

models = []
models.append(('LR', LinearRegression()))
models.append(('LASSO', Lasso()))
models.append(('EN', ElasticNet()))
models.append(('KNN', KNeighborsRegressor()))
models.append(('CART', DecisionTreeRegressor()))
models.append(('SVR', SVR()))

##### Neural Network algorithms

models.append(('MLP', MLPRegressor()))

##### Ensable Models

# Boosting methods
models.append(('ABR', AdaBoostRegressor()))
models.append(('GBR', GradientBoostingRegressor()))

# Bagging methods
models.append(('RFR', RandomForestRegressor()))
models.append(('ETR', ExtraTreesRegressor()))

### <a id='mod_train'> Training and CVing Models </a>

In [None]:
names = []
kfold_results = []
test_results = []
train_results = []

for name, model in models:
    names.append(name)
    
    ## K Fold analysis:
    
    kfold = KFold(n_splits=num_folds, random_state=seed)
    #converted mean square error to positive. The lower the beter
    cv_results = -1* cross_val_score(model, X_train, Y_train, cv=kfold, scoring=scoring)
    kfold_results.append(cv_results)
    

    # Full Training period
    res = model.fit(X_train, Y_train)
    train_result = mean_squared_error(res.predict(X_train), Y_train)
    train_results.append(train_result)
    
    # Test results
    test_result = mean_squared_error(res.predict(X_test), Y_test)
    test_results.append(test_result)
    
    msg = "%s: %f (%f) %f %f" % (name, cv_results.mean(), cv_results.std(), train_result, test_result)
    print(msg)

### <a id='resul'> Visualising Results metrics </a>

In [None]:
#Creating the dataframe. 
results = dict({'names': names,'test_results': test_results,'train_results': train_results})
result_df = pd.DataFrame(results)

#Creating the visual.
visual_15 = go.Figure()
visual_15.add_trace(go.Scatter(x=result_df.names,y=result_df.test_results,name='test_results'))
visual_15.add_trace(go.Bar(x=result_df.names,y=result_df.train_results,name='train_results'))
visual_15.update_yaxes(title='Results')
visual_15.update_xaxes(title='Models')
visual_15.update_layout(title='Results of the model training',height = 600, width = 800)
visual_15.show()

In [None]:
visual_16 = go.Figure()

for result in  range(len(kfold_results)): 
    data = kfold_results[result]
    visual_16.add_trace(go.Box(y=data,name=names[result]))

visual_16.update_yaxes(title='Results')
visual_16.update_xaxes(title='Models')
visual_16.update_layout(title='Results of the k-fold training',height = 600, width = 800)
visual_16.show()

### <a id='Conclusion'> Conclusion!!! Can we learn something more??? </a>

CART, GBR, Random Forests and Ensemble Tree models are doing a good Job in the 10-fold cross validation testing conditions. Even though the best predictive models are at our disposal, better understanding of the features can be more illustrative. 

Some visualisations of the predicted and the actual values are shown for better understanding.

### <a id='Pred_vis'> CART Model Prediction plot </a>

Zooming on the graph shows how the CART model has predicted the profit and loss of many companies that is provided in the dataset. We have the model, no what???

In [None]:
#Retraining the CART model for visualisation
cart = DecisionTreeRegressor()
cart.fit(X_train,Y_train)
Y_pred = cart.predict(X_test)

pred_vis = go.Figure()
pred_vis.add_trace(go.Scatter(x=X_test.index,y=Y_test,name='Test_Profit'))
pred_vis.add_trace(go.Scatter(x=X_test.index,y=Y_pred,name='predicted_Profit',mode='markers'))
pred_vis.show()

### <a id='real'> Real World Predictions </a>

It starts with gathering real world data, which is not in the test or train set. In this case following data is gathered

'Sales' = 25 B USD, 

'Assets' = 50 B USD, 

'MV' = 37 B USD, 

Advantage of the features is that, the past values can be used.

'PoM'= 0.5, 

'PoS'= 2.5, 

'PoA'=4.0.

We input these values into the model that has been predicted. We shall do that next



In [None]:
real_data = np.array([500,10,3.5,0.5,2.5,4])
x = real_data.reshape(1,-1)
print('The profit of the fictious company is {} B USD'.format(cart.predict(x)[0]))

###  The model goes into the wild, and starts predicting companies and their profits for YOU!!! You become a millionaire... (Just Kidding...Don't try this at home) 

Click [here](#go_up)