<p align="center"><img width="50%" src="https://aimodelsharecontent.s3.amazonaws.com/aimodshare_banner.jpg" /></p>


---


<p align="center"><h1 align="center">Internet Ad Model Submission Guide - Predictions Only (No Model Metadata Extraction)

##### <p align="center">*Data Source: Lichman, M. (2013). [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml) . Irvine, CA: University of California, School of Information and Computer Science.*

---
Let's share our models to a centralized leaderboard, so that we can collaborate and learn from the model experimentation process...

**Instructions:**
1.   Get data in and set up X_train / X_test / y_train
2.   Preprocess data
3. Fit model on preprocessed data
4. Generate predictions from X_test data and submit to competition
5. Repeat submission process to improve place on leaderboard



# Objective: Predict whether an image is an advertisement (ad.) or not (nonad.)

---

**Data**: This dataset represents a set of possible advertisements on Internet pages. The features encode the geometry of the image (if available) as well as phrases occuring in the URL, the image's URL and alt text, the anchor text, and words occuring near the anchor text. 

**Features (1558 features)**
* **height** height of image
* **width** width of image
* **aratio** aspect ratio of image
* **URL Terms** 457 features of page urls 
* **orig URL Terms** 495 features from original image urls
* **anc URL Terms** 472 features from anchor urls
* **alt Terms** 111 features from image alt text
* **caption Terms** 19 features from image captions

**Target**
*   Binary variable (ad./nonad)

## 1. Get data in and set up X_train, X_test, y_train objects

In [None]:
#install aimodelshare library
! pip install aimodelshare --upgrade

In [2]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/internet_ads_competition_data-repository:latest')


Data downloaded successfully.


In [3]:
# Load data into X_train, y_train, and X_test
import pandas as pd
X_train = pd.read_csv("internet_ads_competition_data/X_train.csv")
y_train = pd.read_csv("internet_ads_competition_data/y_train.csv", squeeze=True)

X_test=pd.read_csv("internet_ads_competition_data/X_test.csv")

X_train.head()

Unnamed: 0,height,width,aratio,local,url*images+buttons,url*likesbooks.com,url*www.slake.com,url*hydrogeologist,url*oso,url*media,url*peace+images,url*blipverts,url*tkaine+kats,url*labyrinth,url*advertising+blipverts,url*images+oso,url*area51+corridor,url*ran+gifs,url*express-scripts.com,url*off,url*cnet,url*time+1998,url*josefina3,url*truluck.com,url*clawnext+gif,url*autopen.com,url*tvgen.com,url*pixs,url*heartland+5309,url*meadows+9196,url*blue,url*ad+gif,url*area51,url*www.internauts.ca,url*afn.org,url*ran.org,url*shareware.com,url*baons+images,url*area51+labyrinth,url*pics,...,alt*site,alt*to+visit,alt*rank+my,alt*from,alt*page,alt*graphic,alt*like+mine,alt*email+me,alt*visit,alt*free,alt*the+kat,alt*award,alt*services,alt*about,alt*for,alt*here+to,alt*network,alt*you,alt*logo,alt*home,alt*kat,caption*and,caption*home+page,caption*click+here,caption*the,caption*pratchett,caption*here+for,caption*site,caption*page,caption*to,caption*of,caption*home,caption*my,caption*your,caption*in,caption*bytes,caption*here,caption*click,caption*for,caption*you
0,60.0,468.0,7.8,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,120.0,120.0,1.0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,90.0,128.0,1.4222,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,24.0,120.0,5.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,77.0,108.0,1.4025,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [4]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)

(2623, 1558)
(656, 1558)
(2623,)


##2.   Preprocess data


In [None]:
# Write and execute code to preprocess data here

##3. Fit model on preprocessed data


In [None]:
# Write and execute code to fit model on preprocessed data here.

In [5]:
#Set credentials using modelshare.org username/password

from aimodelshare.aws import set_credentials
    
apiurl="https://vy08zh602l.execute-api.us-east-1.amazonaws.com/prod/m"
#This is the unique rest api that powers this Internet Ad Playground

set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [6]:
#Instantiate Competition
import aimodelshare as ai
mycompetition= ai.Competition(apiurl)

## **4. Submit Model predictions to leaderboard (without extracting model architecture information):**
- model metadata extraction allows you use compare_models() and instantiate_model() functions.

In [7]:
#Generate a list of predictions using X_test data

# This example uses randomly chosen values from y_train to generate a list of predictions

predicted_values = list(y_train.sample(n=len(X_test.index)))

In [12]:
#Submit Model predictions to leaderboard (without extracting model architecture information): 

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = None,
                                 preprocessor_filepath=None,
                                 prediction_submission=predicted_values)

Insert search tags to help users find your model (optional): 
Provide any useful notes about your model (optional): 

Your model has been submitted as model version 11

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:1462


In [8]:
# Get leaderboard to explore current best model architectures

# Get raw data in pandas data frame
data = mycompetition.get_leaderboard()

# Stylize leaderboard data
mycompetition.stylize_leaderboard(data)

Unnamed: 0,accuracy,f1_score,precision,recall,ml_framework,model_type,username,version
0,76.37%,53.32%,54.17%,53.14%,unknown,unknown,Analytics_Prac,1


You can also compare two or more models for any models submitted to the leaderboard using example code for model metadata extraction (see code tab for this competition at www.modelshare.org for submission examples.)
```
data=mycompetition.compare_models([1,2], verbose=1)
mycompetition.stylize_compare(data)
```



#####  (Optional Extension) Submit Model With Custom Metadata: 
Can use to add team names or any other missing data you may wish to share on the leaderboard


In [None]:
# Custom metadata can be added by passing a dict to the custom_metadata argument of the submit_model() method
# This option can be used to fill in missing data points or add new columns to the leaderboard

custom_meta = {'team': 'team one',
               'model_type': 'your_model_type',
               'new_column': 'new metadata'}

mycompetition.submit_model(model_filepath = None,
                                 preprocessor_filepath=None,
                                 prediction_submission=predicted_values,
                                 custom_metadata = custom_meta)