# Welcome to Full Stack Machine Learning's Week 4 Project!

In the final week, you will return to the workflow you built last week on the [taxi dataset](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page). 

## Task 1: Deploy the champion
Use what you have learned in the last two weeks to make necessary modifications and to deploy your latest version of the `TaxiFarePrediction` flow to Argo. Use `--branch champion` to denote this deployment as the champion model.

In [64]:
%%writefile ../flows/cloud/taxiprediction_champion.py
from metaflow import FlowSpec, step, card, conda_base, project, current, Parameter, Flow, trigger, retry, timeout,catch
from metaflow.cards import Markdown, Table, Image, Artifact

URL = "https://outerbounds-datasets.s3.us-west-2.amazonaws.com/taxi/latest.parquet"
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'

@trigger(events=['s3'])
@conda_base(libraries={'pandas': '1.4.2', 'pyarrow': '11.0.0', 'numpy': '1.21.2', 'scikit-learn': '1.1.2'})
@project(name="taxi_fare_prediction")
class TaxiFarePrediction(FlowSpec):

    data_url = Parameter("data_url", default=URL)

    def transform_features(self, df):

        obviously_bad_data_filters = [
            df.fare_amount > 0,         
            df.trip_distance <= 100,    
            df.trip_distance > 0,
            df.passenger_count > 0,
            df.mta_tax > 0,
            df.tip_amount >= 0,
            df.tolls_amount >= 0,
            df.total_amount > 0,
            df.PULocationID !=df.DOLocationID,
            df.hour > 0
        ]

        for f in obviously_bad_data_filters:
            df = df[f]

        
        return df

    @catch(var="read_failure")
    @retry(times=4)
    @timeout(minutes=10)
    @step
    def start(self):

        import pandas as pd
        from sklearn.model_selection import train_test_split

        self.df = self.transform_features(pd.read_parquet(self.data_url))

        self.X = self.df["trip_distance"].values.reshape(-1, 1)
        self.y = self.df["total_amount"].values
        self.next(self.linear_model)

    @step
    def linear_model(self):
        from sklearn.linear_model import LinearRegression

        self.model = LinearRegression()

        self.next(self.validate)
                
    
    def gather_sibling_flow_run_results(self):

        # storage to populate and feed to a Table in a Metaflow card
        rows = []

        # loop through runs of this flow 
        for run in Flow(self.__class__.__name__):
            if run.id != current.run_id:
                if run.successful:
                    icon = "✅" 
                    msg = "OK"
                    score = str(run.data.scores.mean())
                else:
                    icon = "❌"
                    msg = "Error"
                    score = "NA"
                    for step in run:
                        for task in step:
                            if not task.successful:
                                msg = task.stderr
                row = [Markdown(icon), Artifact(run.id), Artifact(run.created_at.strftime(DATETIME_FORMAT)), Artifact(score), Markdown(msg)]
                rows.append(row)
            else:
                rows.append([Markdown("✅"), Artifact(run.id), Artifact(run.created_at.strftime(DATETIME_FORMAT)), Artifact(str(self.scores.mean())), Markdown("This run...")])
        return rows
                
    
    @card(type="corise")
    @step
    def validate(self):
        from sklearn.model_selection import cross_val_score
        self.scores = cross_val_score(self.model, self.X, self.y, cv=5,scoring='r2')
        current.card.append(Markdown("# Taxi Fare Prediction Champion Results"))
        current.card.append(Table(self.gather_sibling_flow_run_results(), headers=["Pass/fail", "Run ID", "Created At", "R^2 score", "Stderr"]))
        self.next(self.end)


    @step
    def end(self):
        self.model_type = "baseline"
        print("Score = %s" % self.scores.mean())


if __name__ == "__main__":
    TaxiFarePrediction()

Overwriting ../flows/cloud/taxiprediction_champion.py


In [65]:
! python ../flows/cloud/taxiprediction_champion.py --environment=conda --production --branch champion argo-workflows create 

[35m[1mMetaflow 2.8.6+ob(v1)[0m[35m[22m executing [0m[31m[1mTaxiFarePrediction[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mProject: [0m[32m[1mtaxi_fare_prediction[0m[35m[22m, Branch: [0m[32m[1mprod.champion[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[1mDeploying [0m[31m[1mtaxifareprediction.prod.champion.taxifareprediction[0m[1m to Argo Workflows...[K[0m[1m[0m
[22m[K[0m[22m[0m
[22mThe namespace of this production flow is[K[0m[22m[0m
[32m[22m    production:mfprj-ovzw7jjg7psagpyw-0-pcke[K[0m[32m[22m[0m
[22mTo analyze results of this production flow add this line in your notebooks:[K[0m[22m[0m
[32m[22m    namespace("production:mfprj-ovzw7jjg7psagpyw-0-pcke")[K[0m[32m[22m[0m
[22

In [66]:
! python ../flows/cloud/taxiprediction_champion.py --environment=conda --production --branch champion argo-workflows trigger

[35m[1mMetaflow 2.8.6+ob(v1)[0m[35m[22m executing [0m[31m[1mTaxiFarePrediction[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mProject: [0m[32m[1mtaxi_fare_prediction[0m[35m[22m, Branch: [0m[32m[1mprod.champion[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[1mWorkflow [0m[31m[1mtaxifareprediction.prod.champion.taxifareprediction[0m[1m triggered on Argo Workflows (run-id [0m[31m[1margo-taxifareprediction.prod.champion.taxifareprediction-mp65m[0m[1m).[K[0m[1m[0m
[1mSee the run in the UI at https://ui-pw-535851483.outerbounds.dev/TaxiFarePrediction/argo-taxifareprediction.prod.champion.taxifareprediction-mp65m[K[0m[1m[0m


## Task 2: Build the challenger
Develop a second model, by using the same `TaxiFarePrediction` architecture. Then, deploy the flow to Argo as the `--branch challenger`. 
<br>
<br>
Hint: Modify the `linear_model` step. 
<br>
Bonus: Write a paragraph summary of how you developed the second model and tested it before deploying the challenger flow. Let us know in Slack what you found challenging about the task? 

In [53]:
%%writefile ../flows/cloud/taxiprediction_multiple_models_flow.py
from metaflow import FlowSpec, step, card, conda_base, current, Parameter, Flow, trigger
from metaflow.cards import Markdown, Table, Image, Artifact

URL = "https://outerbounds-datasets.s3.us-west-2.amazonaws.com/taxi/latest.parquet"
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'

@trigger(events=['s3'])
@conda_base(libraries={'pandas': '1.4.2', 'pyarrow': '11.0.0', 'numpy': '1.21.2', 'scikit-learn': '1.1.2', 'lightgbm' : '3.3.5','xgboost' : '1.7.4'})
class TaxiFarePrediction(FlowSpec):

    data_url = Parameter("data_url", default=URL)

    def transform_features(self, df):

        obviously_bad_data_filters = [
            df.fare_amount > 0,         
            df.trip_distance <= 100,    
            df.trip_distance > 0,
            df.passenger_count > 0,
            df.mta_tax > 0,
            df.tip_amount >= 0,
            df.tolls_amount >= 0,
            df.total_amount > 0,
            df.PULocationID !=df.DOLocationID,
            df.hour > 0
        ]

        for f in obviously_bad_data_filters:
            df = df[f]

        
        return df

    @step
    def start(self):

        import pandas as pd
        from sklearn.model_selection import train_test_split

        self.df = self.transform_features(pd.read_parquet(self.data_url))
 
        self.X = self.df["trip_distance"].values.reshape(-1, 1)
        self.y = self.df["total_amount"].values
        self.next(self.model_linear_reg,self.model_elasticnet, self.model_bayesianridge,self.model_xgboost,self.model_lightgbm)
        
    @step
    def model_linear_reg(self):
        from sklearn.linear_model import LinearRegression
        from sklearn.model_selection import cross_val_score
        

        self.reg = LinearRegression()
        self.scores = cross_val_score(self.reg, self.X, self.y, cv=5,scoring='r2')
        print("scores LR", self.scores)
        self.next(self.choose_model)

    @step
    def model_elasticnet(self):
        from sklearn.linear_model import ElasticNet
        from sklearn.model_selection import cross_val_score
        
        self.reg = ElasticNet()
        self.scores = cross_val_score(self.reg, self.X, self.y, cv=5,scoring='r2')
        print("scores EN", self.scores)
        self.next(self.choose_model) 




    @step
    def model_bayesianridge(self):
        from sklearn.linear_model import BayesianRidge
        from sklearn.model_selection import cross_val_score
        
        
        self.reg = BayesianRidge()
        self.scores = cross_val_score(self.reg, self.X, self.y, cv=5,scoring='r2')
        print("scores BR", self.scores)
        self.next(self.choose_model)
    

    @step
    def model_xgboost(self):
        from xgboost import XGBRegressor
        from sklearn.model_selection import cross_val_score
        
       
        self.reg = XGBRegressor() 
        self.scores = cross_val_score(self.reg, self.X, self.y, cv=5,scoring='r2')
        print("scores XG", self.scores)
        self.next(self.choose_model)

    @step
    def model_lightgbm(self):
        from lightgbm import LGBMRegressor
        from sklearn.model_selection import cross_val_score
        
       
        self.reg = LGBMRegressor()
        self.scores = cross_val_score(self.reg, self.X, self.y, cv=5,scoring='r2')
        print("scores LG", self.scores)
        self.next(self.choose_model)

    @card(type="corise")
    @step
    def choose_model(self, inputs):
        """
        find 'best' model
        """
        import numpy as np

        def score(inp):
            return inp.reg, np.mean(inp.scores)

        self.results = sorted(map(score, inputs), key=lambda x: -x[1])
        print(self.results)
        self.model = self.results[0][0]
        current.card.append(Markdown("# Taxi Fare Prediction Multiple Model Results"))
        current.card.append(Artifact(self.results[0][1],self.results[0][0]))
        current.card.append(Artifact(self.results[1][1],self.results[1][0]))
        current.card.append(Artifact(self.results[2][1],self.results[2][0]))
        current.card.append(Artifact(self.results[3][1],self.results[3][0]))
        current.card.append(Artifact(self.results[4][1],self.results[4][0]))
        self.next(self.end)

    
    @step
    def end(self):
        print("Scores:")
        print("\n".join("%s %f" % res for res in self.results))
        



if __name__ == "__main__":
    TaxiFarePrediction()

Overwriting ../flows/cloud/taxiprediction_multiple_models_flow.py


In [54]:
! python ../flows/cloud/taxiprediction_multiple_models_flow.py --environment=conda run

[35m[1mMetaflow 2.8.6+ob(v1)[0m[35m[22m executing [0m[31m[1mTaxiFarePrediction[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[35m2023-05-20 15:47:06.569 [0m[1mWorkflow starting (run-id 895), see it in the UI at https://ui-pw-535851483.outerbounds.dev/TaxiFarePrediction/895[0m
[35m2023-05-20 15:47:07.524 [0m[32m[895/start/4593 (pid 32190)] [0m[1mTask is starting.[0m
[35m2023-05-20 15:47:13.370 [0m[32m[895/start/4593 (pid 32190)] [0m[1mTask finished successfully.[0m
[35m2023-05-20 15:47:14.222 [0m[32m[895/model_linear_reg/4594 (pid 32271)] [0m[1mTask is starting.[0m
[35m2023-05-20 15:47:14.954 [0m[32m[895/model_elastic

In [57]:
%%writefile ../flows/cloud/taxiprediction_xgboost_hyperopt.py
from metaflow import FlowSpec, step, card, conda_base, current, Parameter, Flow, trigger
from metaflow.cards import Markdown, Table, Image, Artifact

URL = "https://outerbounds-datasets.s3.us-west-2.amazonaws.com/taxi/latest.parquet"
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'

@trigger(events=['s3'])
@conda_base(libraries={'pandas': '1.4.2', 'pyarrow': '11.0.0', 'numpy': '1.21.2', 'scikit-learn': '1.1.2', 'lightgbm' : '3.3.5','xgboost' : '1.7.4'})
class TaxiFarePrediction(FlowSpec):

    data_url = Parameter("data_url", default=URL)

    def transform_features(self, df):

    
        obviously_bad_data_filters = [
            df.fare_amount > 0,         
            df.trip_distance <= 100,    
            df.trip_distance > 0,
            df.passenger_count > 0,
            df.mta_tax > 0,
            df.tip_amount >= 0,
            df.tolls_amount >= 0,
            df.total_amount > 0,
            df.PULocationID !=df.DOLocationID,
            df.hour > 0
        ]

        for f in obviously_bad_data_filters:
            df = df[f]

        
        return df

    @step
    def start(self):

        import pandas as pd
        from sklearn.model_selection import train_test_split

        self.df = self.transform_features(pd.read_parquet(self.data_url))
 
        self.X = self.df["trip_distance"].values.reshape(-1, 1)
        self.y = self.df["total_amount"].values
        self.next(self.make_grid)

    @step
    def make_grid(self):
        from sklearn.model_selection import ParameterGrid
        param_values = {'n_estimators': [100, 250, 500],
                        'max_depth': [4, 5, 6],
                        'learning_rate': [0.05, 0.1, 0.25]}

        self.grid_points = list(
            ParameterGrid(param_values)
        )
        
        self.next(self.model_xgboost, 
                  foreach='grid_points')
    


    @step
    def model_xgboost(self):
        from xgboost import XGBRegressor
        from sklearn.model_selection import cross_val_score
    
        self.reg = XGBRegressor(**self.input) 
        self.scores = cross_val_score(self.reg, self.X, self.y, cv=5,scoring='r2')
        self.next(self.choose_model)

   

    @step
    def choose_model(self, inputs):
        """
        find 'best' model
        """
        import numpy as np

        def score(inp):
            return inp.reg,\
                   np.mean(inp.scores)

            
        self.results = sorted(map(score, inputs), key=lambda x: -x[1]) 
        self.model = self.results[0][0]
        
        self.next(self.end)

    @card(type="corise")
    @step
    def end(self):
        """
        End of flow!
        """
        print('Scores:')
        print('\n'.join('%s %f' % res for res in self.results))
        current.card.append(Markdown("Best Model"))
        current.card.append(Artifact(self.model))
        current.card.append(Markdown("Score of Best Model"))
        current.card.append(Artifact(self.results[0][1]))


  


if __name__ == "__main__":
    TaxiFarePrediction()

Overwriting ../flows/cloud/taxiprediction_xgboost_hyperopt.py


In [58]:
! python ../flows/cloud/taxiprediction_xgboost_hyperopt.py --environment=conda run 

[35m[1mMetaflow 2.8.6+ob(v1)[0m[35m[22m executing [0m[31m[1mTaxiFarePrediction[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[35m2023-05-20 16:08:54.897 [0m[1mWorkflow starting (run-id 902), see it in the UI at https://ui-pw-535851483.outerbounds.dev/TaxiFarePrediction/902[0m
[35m2023-05-20 16:08:55.823 [0m[32m[902/start/4625 (pid 328)] [0m[1mTask is starting.[0m
[35m2023-05-20 16:09:01.519 [0m[32m[902/start/4625 (pid 328)] [0m[1mTask finished successfully.[0m
[35m2023-05-20 16:09:02.389 [0m[32m[902/make_grid/4628 (pid 409)] [0m[1mTask is starting.[0m
[35m2023-05-20 16:09:04.773 [0m[32m[902/make_grid/4628 (pid 409)] 

In [67]:
%%writefile ../flows/cloud/taxiprediction_challenger.py
from metaflow import FlowSpec, step, card, conda_base, current, project, Parameter, Flow, trigger, retry, timeout,catch
from metaflow.cards import Markdown, Table, Image, Artifact

URL = "https://outerbounds-datasets.s3.us-west-2.amazonaws.com/taxi/latest.parquet"
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'

@trigger(events=['s3'])
@conda_base(libraries={'pandas': '1.4.2', 'pyarrow': '11.0.0', 'numpy': '1.21.2', 'scikit-learn': '1.1.2', 'lightgbm' : '3.3.5','xgboost' : '1.7.4'})
@project(name="taxi_fare_prediction")
class TaxiFarePrediction(FlowSpec):

    data_url = Parameter("data_url", default=URL)

    def transform_features(self, df):

        obviously_bad_data_filters = [
            df.fare_amount > 0,         
            df.trip_distance <= 100,    
            df.trip_distance > 0,
            df.passenger_count > 0,
            df.mta_tax > 0,
            df.tip_amount >= 0,
            df.tolls_amount >= 0,
            df.total_amount > 0,
            df.PULocationID !=df.DOLocationID,
            df.hour > 0
        ]

        for f in obviously_bad_data_filters:
            df = df[f]

        
        return df

    @catch(var="read_failure")
    @retry(times=4)
    @timeout(minutes=10)
    @step
    def start(self):

        import pandas as pd
        from sklearn.model_selection import train_test_split

        self.df = self.transform_features(pd.read_parquet(self.data_url))

        self.X = self.df["trip_distance"].values.reshape(-1, 1)
        self.y = self.df["total_amount"].values
        self.next(self.xgb_model)

    @step
    def xgb_model(self):
        "Fit a single variable, linear model to the data."
        from xgboost import XGBRegressor

        self.model = XGBRegressor(learning_rate=0.05,max_depth=4,n_estimators=100)

        self.next(self.validate)
                
    
    def gather_sibling_flow_run_results(self):

        # storage to populate and feed to a Table in a Metaflow card
        rows = []

        # loop through runs of this flow 
        for run in Flow(self.__class__.__name__):
            if run.id != current.run_id:
                if run.successful:
                    icon = "✅" 
                    msg = "OK"
                    score = str(run.data.scores.mean())
                else:
                    icon = "❌"
                    msg = "Error"
                    score = "NA"
                    for step in run:
                        for task in step:
                            if not task.successful:
                                msg = task.stderr
                row = [Markdown(icon), Artifact(run.id), Artifact(run.created_at.strftime(DATETIME_FORMAT)), Artifact(score), Markdown(msg)]
                rows.append(row)
            else:
                rows.append([Markdown("✅"), Artifact(run.id), Artifact(run.created_at.strftime(DATETIME_FORMAT)), Artifact(str(self.scores.mean())), Markdown("This run...")])
        return rows
                
    
    @card(type="corise")
    @step
    def validate(self):
        from sklearn.model_selection import cross_val_score
        self.scores = cross_val_score(self.model, self.X, self.y, cv=5,scoring='r2')
        current.card.append(Markdown("# Taxi Fare Prediction Challenger Results"))
        current.card.append(Table(self.gather_sibling_flow_run_results(), headers=["Pass/fail", "Run ID", "Created At", "R^2 score", "Stderr"]))
        self.next(self.end)


    @step
    def end(self):
        self.model_type = "xgboost"
        print("Score = %s" % self.scores.mean())


if __name__ == "__main__":
    TaxiFarePrediction()

Overwriting ../flows/cloud/taxiprediction_challenger.py


In [68]:
! python ../flows/cloud/taxiprediction_challenger.py --environment=conda --production --branch challenger --production argo-workflows create 

[35m[1mMetaflow 2.8.6+ob(v1)[0m[35m[22m executing [0m[31m[1mTaxiFarePrediction[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mProject: [0m[32m[1mtaxi_fare_prediction[0m[35m[22m, Branch: [0m[32m[1mprod.challenger[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[1mDeploying [0m[31m[1mtaxifareprediction.prod.challenger.taxifareprediction[0m[1m to Argo Workflows...[K[0m[1m[0m
[22m[K[0m[22m[0m
[22mThe namespace of this production flow is[K[0m[22m[0m
[32m[22m    production:mfprj-6iffsxtybx6fkjku-0-crtx[K[0m[32m[22m[0m
[22mTo analyze results of this production flow add this line in your notebooks:[K[0m[22m[0m
[32m[22m    namespace("production:mfprj-6iffsxtybx6fkjku-0-crtx")[K[0m[32m[22m[0m


In [69]:
! python ../flows/cloud/taxiprediction_challenger.py --environment=conda --production --branch challenger --production argo-workflows trigger

[35m[1mMetaflow 2.8.6+ob(v1)[0m[35m[22m executing [0m[31m[1mTaxiFarePrediction[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:sandbox[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mProject: [0m[32m[1mtaxi_fare_prediction[0m[35m[22m, Branch: [0m[32m[1mprod.challenger[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[1mWorkflow [0m[31m[1mtaxifareprediction.prod.challenger.taxifareprediction[0m[1m triggered on Argo Workflows (run-id [0m[31m[1margo-taxifareprediction.prod.challenger.taxifareprediction-wvvpc[0m[1m).[K[0m[1m[0m
[1mSee the run in the UI at https://ui-pw-535851483.outerbounds.dev/TaxiFarePrediction/argo-taxifareprediction.prod.challenger.taxifareprediction-wvvpc[K[0m[1m[0m


## Task 3: Analyze the results
Return to this notebook, and read in the results of the challenger and champion flow using the Metaflow Client API.
<br><br>



In [72]:
from metaflow import Flow, namespace
import numpy as np

CHAMPION_MODEL_NAMESPACE = 'production:mfprj-ovzw7jjg7psagpyw-0-pcke'
CHALLENGER_MODEL_NAMESPACE = 'production:mfprj-6iffsxtybx6fkjku-0-crtx'

best_score = -1; winner = None; winner_namespace = None
for n in [CHAMPION_MODEL_NAMESPACE, CHALLENGER_MODEL_NAMESPACE]:
    namespace(n)
    run = Flow('TaxiFarePrediction').latest_successful_run
    acc_score = np.mean(run.data.scores)
    print("Latest {} model had accuracy = {}%".format(run.data.model_type, acc_score))
    if acc_score > best_score:
        best_score = acc_score
        winner = run.data.model_type
        winner_namespace = n
print("Winner is  {} model  with accuracy of {}%".format(winner, round(100*best_score, 2)))


Latest baseline model had accuracy = 0.9198880750324298%
Latest xgboost model had accuracy = 0.9304976303014122%
Winner is  xgboost model  with accuracy of 93.05%


#### Questions
- Does your model perform better on the metrics you selected? 

Ans: Yes, challenger model marginally perform better than champion model.

- Think about your day job, how would you go about assessing whether to roll forward the production "champion" to your new model? 
    - What gives you confidence one model is better than another?
    - What kinds of information do you need to monitor to get buy-in from stakeholders that model A is preferable to model B?  

Ans: Based on the metrics chosen, which is R^2 score, the challenger model seem to perform better marginally. However, the question is whether it consistently perform better and hence it needs to be monitored over a period of time with inflow of realtime data.
To ensure which model is preferable, a number of factors come into play based on the problem at hand. Some of them are: Ease of deployment of the new model compared to existing ones, what metrics are we monitoring based on the problem being solved, the latency of prediction, whether the increase in performance is really worth it compared to the tradeoff at which this performance is achieved etc.

## CONGRATULATIONS! 🎉✨🍾
If you made it this far, you have completed the Full Stack Machine Learning Corise course. 
We are so glad that you chose to learn with us, and hope to see you again in future courses. Stay tuned for more content and come join us in [Slack](http://slack.outerbounds.co/) to keep learning about Metaflow!