# The spring-up and the prosperity of the digital economy

## In this tutorial we will use the dataset with the following attributes as an example:
**2 Attributes:**

1. Session_id
2. DateTime
3. User_id
4. Product
5. Campaign_id
6. Webpage_id
7. Product_category_1
8. Product_category_2
9. User_group_id
10. gender
11. age_level
12. user_depth
13. city_development_index
14. var_1
15. is_click(The response we focused on. Whether the user click or not)

Datasets comes from: https://www.kaggle.com/datasets/arashnic/ctr-in-advertisement?resource=download

This tutorial aims at using SVM to classify whether the user click the advertisement or not. For the company like Google, it should determine what advertisement to recommend so that the click rate is higher, and how much it should charge from advertisement companies. Also, product company should determine how much benefits can be acquired with the advertisement.


## Background

At present, the digital economy has become a new form of economic and social development after the agricultural and industrial economy, as well as a typical representative of the new round of industrial revolution. Along with the development of cloud computing, big data, artificial intelligence and industrial Internet, a new round of information technology revolution is breaking out and the digital economy is rising.

 

Since the birth of the world's first general-purpose electronic computer at the University of Pennsylvania in 1946, it has opened the curtain of the digital era for mankind. However, as an economic form, the digital economy, in fact, has emerged as early as the development of the semiconductor industry. Today, the digital economy is ubiquitous in our lives, and innovative digital economy applications such as mobile payments, gig economy and advertisement auction have influenced every aspect of our daily lives.

 

Vigorous development of the digital economy has also become a global consensus. According to the World Internet Development Report 2018, the global digital economy reached US$12.9 trillion in 2017, with the United States and China ranking among the top two in the world. The new economy represented by the digital economy is now flourishing and has become a new engine to drive global economic growth.

**Interesting Article:** https://www.brookings.edu/research/the-fourth-industrial-revolution-and-digitization-will-transform-africa-into-a-global-powerhouse/


## Google Ads and Ads Auction

Advertisements are everywhere that we see, omnipresent presence makes them impressive and sometimes influence our expense plans. Unlike previous advertisement on television or newspaper, because of the prevalent usage of digital devices, people can see advertisements most of the time. 

One good example is **Google Ads**(formerly **Google AdWords**). When we use the search engine, some advertisements will be shown on the screen to attract users. To determine which advertisement to be shown to platform users, Google will determine by the bid (The maximum amount that company is willing to pay for a click), the quality of ads and the expected impact from the ad extensions and other ad formats. 

Thus, it is important also to determine which to display on the screen, determined by users' interests, for example. An example database of click rate is here for you to practise: https://www.kaggle.com/datasets/arashnic/ctr-in-advertisement

Example codes are also in the tutorial.


## Gig Economy

In such a busy and fast-pace era, new form of economy is burgeoning. According to the definition in dictionary, gig economy is "*an economic sector consisting of part-time, temporary and freelance jobs*" Gig economy is also ubiquitous like the daily platform we use, Uber, Uber Eats, Lyft, etc. 

Reading Reference: https://gadallon.substack.com/p/the-future-of-the-gig-economy-growth?r=zgog


## Code Example


In [35]:
# First, Import all packages
import numpy as np
import pandas as pd
import gensim
from gensim.models import KeyedVectors
from gensim import models
from sklearn.model_selection import train_test_split
from gensim.test.utils import common_texts
from gensim.models import Word2Vec
from sklearn.svm import SVC

In [36]:
# The data is already separated into training and testing
dataset_train = pd.read_csv('Ad_click_prediction_train.csv')
dataset_test = pd.read_csv('Ad_Click_prediciton_test.csv')
dataset_train

Unnamed: 0,session_id,DateTime,user_id,product,campaign_id,webpage_id,product_category_1,product_category_2,user_group_id,gender,age_level,user_depth,city_development_index,var_1,is_click
0,140690,2017-07-02 00:00,858557,C,359520,13787,4,,10.0,Female,4.0,3.0,3.0,0,0
1,333291,2017-07-02 00:00,243253,C,105960,11085,5,,8.0,Female,2.0,2.0,,0,0
2,129781,2017-07-02 00:00,243253,C,359520,13787,4,,8.0,Female,2.0,2.0,,0,0
3,464848,2017-07-02 00:00,1097446,I,359520,13787,3,,3.0,Male,3.0,3.0,2.0,1,0
4,90569,2017-07-02 00:01,663656,C,405490,60305,3,,2.0,Male,2.0,3.0,2.0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
463286,583588,2017-07-07 23:59,572718,H,118601,28529,5,82527.0,4.0,Male,4.0,3.0,2.0,0,0
463287,198389,2017-07-07 23:59,130461,I,118601,28529,4,82527.0,10.0,Female,4.0,3.0,2.0,1,0
463288,563423,2017-07-07 23:59,306241,D,118601,28529,4,82527.0,2.0,Male,2.0,3.0,,0,0
463289,595571,2017-07-07 23:59,306241,D,118601,28529,5,82527.0,2.0,Male,2.0,3.0,,0,0


In [37]:
# Remove NA data
dataset_train_rm = dataset_train.dropna()
dataset_test_rm = dataset_test.dropna()
dataset_train_rm

Unnamed: 0,session_id,DateTime,user_id,product,campaign_id,webpage_id,product_category_1,product_category_2,user_group_id,gender,age_level,user_depth,city_development_index,var_1,is_click
17,2927,2017-07-02 00:03,295456,I,404347,53587,1,146115.0,9.0,Female,3.0,3.0,3.0,1,0
21,3803,2017-07-02 00:03,312475,I,404347,53587,1,146115.0,2.0,Male,2.0,3.0,4.0,1,0
42,2670,2017-07-02 00:05,649512,I,404347,53587,1,146115.0,2.0,Male,2.0,3.0,1.0,1,0
48,390567,2017-07-02 00:06,99306,H,105960,11085,5,270915.0,4.0,Male,4.0,3.0,2.0,0,0
49,381228,2017-07-02 00:06,99306,H,105960,11085,5,270915.0,4.0,Male,4.0,3.0,2.0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
463279,579414,2017-07-07 23:59,563083,H,118601,28529,5,82527.0,3.0,Male,3.0,3.0,2.0,0,0
463280,547394,2017-07-07 23:59,1132443,G,118601,28529,5,82527.0,3.0,Male,3.0,3.0,4.0,0,0
463281,393785,2017-07-07 23:59,12050,I,118601,28529,4,82527.0,3.0,Male,3.0,3.0,3.0,0,0
463286,583588,2017-07-07 23:59,572718,H,118601,28529,5,82527.0,4.0,Male,4.0,3.0,2.0,0,0


In [42]:
# Seperate x and y
x_train = dataset_train_rm.loc[:,'product':'var_1']
x_test = dataset_test_rm.loc[:,'product':'var_1']

# Convert "Product" String into float
# We use One Hot Encoding here
dummies = pd.get_dummies(x_train, columns=['product','gender'])
merged = pd.concat([x_train,dummies],axis='columns')

dummies_test = pd.get_dummies(x_test, columns=['product','gender'])
merged_test = pd.concat([x_test,dummies_test],axis='columns')

x_test = merged_test.drop(['product','gender'], axis='columns')

x_train = merged.drop(['product','gender'], axis='columns')
print(x_train)


y_train = dataset_train_rm.loc[:,'is_click']

        campaign_id  webpage_id  product_category_1  product_category_2  \
17           404347       53587                   1            146115.0   
21           404347       53587                   1            146115.0   
42           404347       53587                   1            146115.0   
48           105960       11085                   5            270915.0   
49           105960       11085                   5            270915.0   
...             ...         ...                 ...                 ...   
463279       118601       28529                   5             82527.0   
463280       118601       28529                   5             82527.0   
463281       118601       28529                   4             82527.0   
463286       118601       28529                   5             82527.0   
463287       118601       28529                   4             82527.0   

        user_group_id  age_level  user_depth  city_development_index  var_1  \
17                9.

In [39]:
# Start SVM classifier
svm_classifier = SVC()
svm_classifier.fit(x_train, y_train)

SVC()

In [40]:
# Check the Accuracy
svm_classifier.score(x_train, y_train)


0.9368841178089733

In [43]:
# Make Prediction
svm_classifier.predict(x_test)

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)