Hello there Kagglers stumbling upon this fair notebook. The purpose of this notebook is to see if we can predict the Global_Sales a game will have using a machine learning model. But first, we have to visualize the data.

**Initial data analysis**

Hello, fellow Kagglers. In this notebook, we will use data manipulation and visualization, as well as a Regression Algorithm, to try to predict the Global_Sales of a videogame. 

**Data loading and Initialization**

In [None]:
suppressWarnings(library(ggplot2)) # Data visualization
suppressWarnings(library(readr)) # CSV file I/O, e.g. the read_csv function
suppressWarnings(library(sqldf))
suppressWarnings(library(fmsb))
suppressWarnings(library(plotly))


# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

system("ls ../input")


games<-read.csv('../input/vgsales.csv')
games[games=="N/A"]<-NA
games<-games[complete.cases(games),]
games$Year<-as.numeric(as.character(games$Year))
games<-games[games$Year<=2016,]
# Any results you write to the current directory are saved as output.

Let's visualize what we are dealing with. 

In [None]:
head(games)

Okay, so the dataset has games, their platform, year of release, genre, publisher and Sales by region. There's also the Global_Sales, which we are trying to predict, and it's the sum of the sales of every region. 

**First Data Manipulation**

Since the sales of each region obviously highly correlate to the Global_Sales,  it's safe to remove them. 

In [None]:
games <- games[!(names(games) %in% c("NA_Sales","EU_Sales","JP_Sales","Other_Sales"))]
head(games)

Great, now we only have the Global_Sales as a sales factor. There are some interesting visualizations and insights from the sales by region, but I'm not going to cover them in this notebook.  Next, let's view the top 20 games by Global_Sales. Since the dataset is already ordered that way, we just have to use the rank as a condition.

In [None]:
games[games$Rank <= 20,]

We can see different but related games here, like Pokemon Red/Blue and Pokemon Gold/Silver, which are part of the Pokemon franchise. Let's try creating a new feature called franchise, based on these names. We'll do this manually, but there could be another way to do this by Iterating a word in the title and seeing if it repeats on other games, and create a franchise based on that, but that would be really complicated. So instead, let's do the hacky solution. Let's first create the Nintendo franchises. 

In [None]:
#Initialize in NA 
games$Franchise <- NA
#Most popular nintendo Franchises
games$Franchise[games$Name %in% grep("Wii", games$Name, value = TRUE)]<- "Wii Series"
games$Franchise[games$Name %in% grep("Animal Crossing", games$Name, value = TRUE)]<- "Animal Crossing"
games$Franchise[games$Name %in% grep("Brain Age", games$Name, value = TRUE)]<- "Brain Age"
games$Franchise[games$Name %in% grep("Donkey Kong", games$Name, value = TRUE)]<- "Donkey Kong"
games$Franchise[games$Name %in% grep("F-Zero", games$Name, value = TRUE)]<- "F-Zero"
games$Franchise[games$Name %in% grep("Fire Emblem", games$Name, value = TRUE)]<- "Fire Emblem"
games$Franchise[games$Name %in% grep("Golden Sun", games$Name, value = TRUE)]<- "Golden Sun"
games$Franchise[games$Name %in% grep("Kid Icarus", games$Name, value = TRUE)]<- "Kid Icarus"
games$Franchise[games$Name %in% grep("Kirby", games$Name, value = TRUE)]<- "Kirby"
games$Franchise[games$Name %in% grep("Zelda", games$Name, value = TRUE)]<- "Zelda"
games$Franchise[games$Name %in% grep("Mario", games$Name, value = TRUE)]<- "Mario"
games$Franchise[games$Name %in% grep("Metroid", games$Name, value = TRUE)]<- "Metroid"
games$Franchise[games$Name %in% grep("Mother", games$Name, value = TRUE) & games$Publisher == "Nintendo"]<- "Mother"
games$Franchise[games$Name %in% grep("EarthBound", games$Name, value = TRUE)]<- "Mother"
games$Franchise[games$Name %in% grep("Wars", games$Name, value = TRUE) & games$Publisher=="Nintendo" & games$Genre =="Strategy"]<- "Nintendo Wars"
games$Franchise[games$Name %in% grep("Pikmin", games$Name, value = TRUE)]<- "Pikmin"
games$Franchise[games$Name %in% grep("Pilotwings", games$Name, value = TRUE)]<- "Pilotwings"
games$Franchise[games$Name %in% grep("Pokemon", games$Name, value = TRUE)]<- "Pokemon"
games$Franchise[games$Name %in% grep("Poké", games$Name, value = TRUE)]<- "Pokemon"
games$Franchise[games$Name %in% grep("Punch-", games$Name, value = TRUE)]<- "Punch-Out"
games$Franchise[games$Name %in% grep("Puzzle League", games$Name, value = TRUE)]<- "Puzzle League"
games$Franchise[games$Name %in% grep("Star Fox", games$Name, value = TRUE)]<- "Star Fox"
games$Franchise[games$Name %in% grep("Super Smash Bros", games$Name, value = TRUE)]<- "Super Smash Bros"
games$Franchise[games$Name %in% grep("Chibi-", games$Name, value = TRUE)]<- "Chibi-Robo"
games$Franchise[games$Name %in% grep("Custom Robo", games$Name, value = TRUE)]<- "Custom Robo"
games$Franchise[games$Name %in% grep("Yoshi", games$Name, value = TRUE)]<- "Yoshi"
games$Franchise[games$Name %in% grep("Wario", games$Name, value = TRUE)]<- "Wario"
games$Franchise[games$Name %in% grep("Tetris", games$Name, value = TRUE)]<- "Tetris"
games$Franchise[games$Name %in% grep("Nintendogs", games$Name, value = TRUE)]<- "Nintendogs"
games$Franchise[games$Name %in% grep("Duck Hunt", games$Name, value = TRUE)]<- "Duck Hunt"

games[games$Publisher=="Nintendo" ,]

That's a start. Now let's do the same with Sony Franchises. Let's first see which their first party games.

Okay, now we can start creating the Sony Franchises.

In [None]:
#Sony franchises
games$Franchise[games$Name %in% grep("Gran Turismo", games$Name, value = TRUE)]<- "Gran Turismo"
games$Franchise[games$Name %in% grep("Crash Bandicoot", games$Name, value = TRUE)]<- "Crash Bandicoot"
games$Franchise[games$Name %in% grep("Crash", games$Name, value = TRUE) & games$Publisher == "Vivendi Games"]<- "Crash Bandicoot"
games$Franchise[games$Name %in% grep("Tekken", games$Name, value = TRUE)]<- "Tekken"
games$Franchise[games$Name %in% grep("Uncharted", games$Name, value = TRUE)]<- "Uncharted"
games$Franchise[games$Name %in% grep("LittleBigPlanet", games$Name, value = TRUE)]<- "LittleBigPlanet"
games$Franchise[games$Name %in% grep("Spyro", games$Name, value = TRUE)]<- "Spyro"
games$Franchise[games$Name %in% grep("God of War", games$Name, value = TRUE)]<- "God of War"
games$Franchise[games$Name %in% grep("Jak", games$Name, value = TRUE) & games$Publisher == "Sony Computer Entertainment"]<- "Jak"
games$Franchise[games$Name %in% grep("Daxter", games$Name, value = TRUE)]<- "Jak"
games[games$Publisher =="Sony Computer Entertainment",]

Next, the same thing but with Microsoft franchises. 

In [None]:
#Microsoft Franchises
games$Franchise[games$Name %in% grep("Kinect", games$Name, value = TRUE)]<- "Kinect"
games$Franchise[games$Name %in% grep("Fable", games$Name, value = TRUE)]<- "Fable"
games$Franchise[games$Name %in% grep("Halo", games$Name, value = TRUE)]<- "Halo"
games$Franchise[games$Name %in% grep("Gears of War", games$Name, value = TRUE)]<- "Gears of War"
games$Franchise[games$Name %in% grep("Forza", games$Name, value = TRUE)]<- "Forza"
games$Franchise[games$Name %in% grep("Microsoft", games$Name, value = TRUE)]<- "Microsoft Games"



games[games$Publisher %in% grep("Microsoft", games$Publisher, value = TRUE),]

Now for all the popular franchises ( there's a lot of them).

In [None]:
#Other popular franchises that are not part of the big 3 console makers.
#I know I missed a lot
games$Franchise[games$Name %in% grep("007", games$Name, value = TRUE)]<- "James Bond"
games$Franchise[games$Name %in% grep("Ace Attorney", games$Name, value = TRUE)]<- "Ace Attorney"
games$Franchise[games$Name %in% grep("Age of Empire", games$Name, value = TRUE)]<- "Age of Empires"
games$Franchise[games$Name %in% grep("Alan Wake", games$Name, value = TRUE)]<- "Alan Wake"
games$Franchise[games$Name %in% grep("Alex Kidd", games$Name, value = TRUE)]<- "Alex Kidd"
games$Franchise[games$Name %in% grep("Ape Escape", games$Name, value = TRUE)]<- "Ape Escape"
games$Franchise[games$Name %in% grep("ARMA", games$Name, value = TRUE)]<- "ARMA"
games$Franchise[games$Name %in% grep("Assassin's Creed", games$Name, value = TRUE)]<- "Assassin's Creed"
games$Franchise[games$Name %in% grep("Atelier", games$Name, value = TRUE)]<- "Atelier"
games$Franchise[games$Name %in% grep("Baldur", games$Name, value = TRUE)]<- "Baldur"
games$Franchise[games$Name %in% grep("Banjo", games$Name, value = TRUE)]<- "Banjo"
games$Franchise[games$Name %in% grep("Baten Kaitos", games$Name, value = TRUE)]<- "Baten Kaitos"
games$Franchise[games$Name %in% grep("Batman", games$Name, value = TRUE)]<- "Batman"
games$Franchise[games$Name %in% grep("Battlefield", games$Name, value = TRUE)]<- "Battlefield"
games$Franchise[games$Name %in% grep("Big Brain", games$Name, value = TRUE)]<- "Big Brain"
games$Franchise[games$Name %in% grep("Bayonetta", games$Name, value = TRUE)]<- "Bayonetta"
games$Franchise[games$Name %in% grep("Binding of Isaac", games$Name, value = TRUE)]<- "Binding of Isaac"
games$Franchise[games$Name %in% grep("Bioshock", games$Name, value = TRUE)]<- "Bioshock"
games$Franchise[games$Name %in% grep("Bomberman", games$Name, value = TRUE)]<- "Bomberman"
games$Franchise[games$Name %in% grep("Borderlands", games$Name, value = TRUE)]<- "Borderlands"
games$Franchise[games$Name %in% grep("Bravely", games$Name, value = TRUE)]<- "Bravely"
games$Franchise[games$Name %in% grep("Breath of Fire", games$Name, value = TRUE)]<- "Breath of Fire"
games$Franchise[games$Name %in% grep("Bubsy", games$Name, value = TRUE)]<- "Bubsy"
games$Franchise[games$Name %in% grep("Burnout", games$Name, value = TRUE)]<- "Burnout"
games$Franchise[games$Name %in% grep("Call of Duty", games$Name, value = TRUE)]<- "Call of Duty"
games$Franchise[games$Name %in% grep("Castlevania", games$Name, value = TRUE)]<- "Castlevania"
games$Franchise[games$Name %in% grep("Chrono", games$Name, value = TRUE)]<- "Chrono"
games$Franchise[games$Name %in% grep("Civilization", games$Name, value = TRUE)]<- "Civilization"
games$Franchise[games$Name %in% grep("Conker", games$Name, value = TRUE)]<- "Conker"
games$Franchise[games$Name %in% grep("Contra", games$Name, value = TRUE)]<- "Contra"
games$Franchise[games$Name %in% grep("Cooking Mama", games$Name, value = TRUE)]<- "Cooking Mama"
games$Franchise[games$Name %in% grep("Counter-", games$Name, value = TRUE)]<- "Counter Strike"
games$Franchise[games$Name %in% grep("Crazy Taxi", games$Name, value = TRUE)]<- "Crazy Taxi"
games$Franchise[games$Name %in% grep("Crysis", games$Name, value = TRUE)]<- "Crysis"
games$Franchise[games$Name %in% grep("Dead or Alive", games$Name, value = TRUE)]<- "Dead or Alive"
games$Franchise[games$Name %in% grep("Dead Rising", games$Name, value = TRUE)]<- "Dead Rising"
games$Franchise[games$Name %in% grep("Dead Space", games$Name, value = TRUE)]<- "Dead Space"
games$Franchise[games$Name %in% grep("Deus Ex", games$Name, value = TRUE)]<- "Deus Ex"
games$Franchise[games$Name %in% grep("Diablo", games$Name, value = TRUE)]<- "Diablo"
games$Franchise[games$Name %in% grep("Devil May Cry", games$Name, value = TRUE)]<- "Devil May Cry"
games$Franchise[games$Name %in% grep("Disgaea", games$Name, value = TRUE)]<- "Disgaea"
games$Franchise[games$Name %in% grep("Doom", games$Name, value = TRUE)]<- "Doom"
games$Franchise[games$Name %in% grep("Double Dragon", games$Name, value = TRUE)]<- "Double Dragon"
games$Franchise[games$Name %in% grep("Dragon Age", games$Name, value = TRUE)]<- "Dragon Age"
games$Franchise[games$Name %in% grep("Dragon Quest", games$Name, value = TRUE)]<- "Dragon Quest"
games$Franchise[games$Name %in% grep("Dragon Warrior", games$Name, value = TRUE)]<- "Dragon Quest"
games$Franchise[games$Name %in% grep(" Warriors", games$Name, value = TRUE)]<- "Mousou"
games$Franchise[games$Name %in% grep("Elder Scrolls", games$Name, value = TRUE)]<- "Elder Scrolls"
games$Franchise[games$Name %in% grep("Fallout", games$Name, value = TRUE)]<- "Fallout"
games$Franchise[games$Name %in% grep("Far Cry", games$Name, value = TRUE)]<- "Far Cry"
games$Franchise[games$Name %in% grep("FIFA", games$Name, value = TRUE)]<- "FIFA"
games$Franchise[games$Name %in% grep("Final Fantasy", games$Name, value = TRUE)]<- "Final Fantasy"
games$Franchise[games$Name %in% grep("Five Nights", games$Name, value = TRUE)]<- "FNAF"
games$Franchise[games$Name %in% grep("Frogger", games$Name, value = TRUE)]<- "Frogger"
games$Franchise[games$Name %in% grep("'n Goblins'", games$Name, value = TRUE)]<- "Ghosts 'n Goblins"
games$Franchise[games$Name %in% grep("Grand Theft Auto", games$Name, value = TRUE)]<- "Grand Theft Auto"
games$Franchise[games$Name %in% grep("Guitar Hero", games$Name, value = TRUE)]<- "Guitar Hero"
games$Franchise[games$Name %in% grep("Half-Life", games$Name, value = TRUE)]<- "Half-Life"
games$Franchise[games$Name %in% grep("Harvest Moon", games$Name, value = TRUE)]<- "Harvest Moon"
games$Franchise[games$Name %in% grep("Hitman", games$Name, value = TRUE)]<- "Hitman"
games$Franchise[games$Name %in% grep("inFamous", games$Name, value = TRUE)]<- "inFamous"
games$Franchise[games$Name %in% grep("Bond", games$Name, value = TRUE)]<- "James Bond"
games$Franchise[games$Name %in% grep("Jet Set", games$Name, value = TRUE)]<- "Jet Set"
games$Franchise[games$Name %in% grep("Just Dance", games$Name, value = TRUE)]<- "Just Dance"
games$Franchise[games$Name %in% grep("Killer Instinct", games$Name, value = TRUE)]<- "Killer Instinct"
games$Franchise[games$Name %in% grep("Kingdom Hearts", games$Name, value = TRUE)]<- "Kingdom Hearts"
games$Franchise[games$Name %in% grep("Madden", games$Name, value = TRUE)]<- "Madden"
games$Franchise[games$Name %in% grep("Metal Gear", games$Name, value = TRUE)]<- "Metal Gear"
games$Franchise[games$Name %in% grep("Midnight Club", games$Name, value = TRUE)]<- "Midnight Club"
games$Franchise[games$Name %in% grep("Monster Hunter", games$Name, value = TRUE)]<- "Monster Hunter"
games$Franchise[games$Name %in% grep("Mortal Kombat", games$Name, value = TRUE)]<- "Mortal Kombat"
games$Franchise[games$Name %in% grep("NBA 2K", games$Name, value = TRUE)]<- "NBA 2K"
games$Franchise[games$Name %in% grep("Need for Speed", games$Name, value = TRUE)]<- "Need for Speed"
games$Franchise[games$Name %in% grep("Ninja Gaiden", games$Name, value = TRUE)]<- "Ninja Gaiden"
games$Franchise[games$Name %in% grep("Onimusha", games$Name, value = TRUE)]<- "Onimusha"
games$Franchise[games$Name %in% grep("Pac-", games$Name, value = TRUE)]<- "Pacman"
games$Franchise[games$Name %in% grep("Professor Layton", games$Name, value = TRUE)]<- "Professor Layton"
games$Franchise[games$Name %in% grep("Prince of Persia", games$Name, value = TRUE)]<- "Prince of Persia"
games$Franchise[games$Name %in% grep("Quake", games$Name, value = TRUE)]<- "Quake"
games$Franchise[games$Name %in% grep("Rayman", games$Name, value = TRUE)]<- "Rayman"
games$Franchise[games$Name %in% grep("Rampage", games$Name, value = TRUE)]<- "Rampage"
games$Franchise[games$Name %in% grep("Resident Evil", games$Name, value = TRUE)]<- "Resident Evil"
games$Franchise[games$Name %in% grep("Sims", games$Name, value = TRUE)]<- "Sims"
games$Franchise[games$Name %in% grep("Sonic", games$Name, value = TRUE)]<- "Sonic"
games$Franchise[games$Name %in% grep("Shin Megami Tensei", games$Name, value = TRUE)]<- "SMT"
games$Franchise[games$Name %in% grep("Persona ", games$Name, value = TRUE)]<- "Persona"
games$Franchise[games$Name %in% grep("Souls", games$Name, value = TRUE)]<- "Souls"
games$Franchise[games$Name %in% grep("Star Wars", games$Name, value = TRUE)]<- "Star Wars"
games$Franchise[games$Name %in% grep("Street Fighter", games$Name, value = TRUE)]<- "Street Fighter"
games$Franchise[games$Name %in% grep("Tales of", games$Name, value = TRUE)]<- "Tales of"
games$Franchise[games$Name %in% grep("Turok", games$Name, value = TRUE)]<- "Turok"
games$Franchise[games$Name %in% grep("Warcraft", games$Name, value = TRUE)]<- "Warcraft"
games$Franchise[games$Name %in% grep("Worms", games$Name, value = TRUE)]<- "Worms"

Now, let's visualize the dataset one more time.

In [None]:
games[games$Rank<=100,]

Great! Now for two more steps. First let's turn all NAs into "None" 

In [None]:
games$Franchise[is.na(games$Franchise)] <- "None"
games[games$Franchise == "None",]

Wew, we missed some popular franchises in the top 250, but we'll let it slide. 

One option that we have is merging the Action, Adventure and Shooter genre,  since Action can mean a lot of things (Beat em ups, Shoot em ups, Hack and Slash, etc.) but we'll also let it slide. 

**Data visualization**

Now we can start creating some plots to better understand the distribution of the data. 
First, let's see the distribution of sales by Genre

In [None]:


ggplot(games, aes(x=Genre, y=Global_Sales, fill = Genre)) +
    geom_bar(stat="identity")+
    ggtitle("Global Sales by Genre")

Looks like action is the  best selling genre of games by quite a bit, next is Sports. Strategy games don't sell that well unfortunately.  What about the current gen (as of 22/3/2017)  consoles, how are their games doing?

In [None]:
ggplot(subset(games, Platform %in% c("WiiU","XOne","PS4","3DS","PSV")), aes(x=Platform, y=Global_Sales, fill = Platform)) +
    geom_bar(stat="identity")+
    ggtitle("Global Sales by Platform")

Looks like games sell really well on 3DS and PS4! PSVita and WiiU, not so much.  
Now let's see something for our new feature, franchises. Let's see franchise numbers for the top 50.

In [None]:
#Create top 10 best selling franchises;
BSF <- aggregate(games$Global_Sales, by=list(Franchise=games$Franchise), FUN=sum)
BSF <- BSF[order(-BSF$x),]
BSF <- BSF[BSF$Franchise != "None", ]
tenBSF <- BSF[1:10,] 

ggplot(tenBSF, 
       aes(x=Franchise, y=x, fill = Franchise)) +
    geom_bar(stat="identity")+
    ggtitle("Global Sales by best selling Franchises")

Looks like Mario, Pokemon and Call of Duty are best sellers.  I think we've seen enough. Before going to the prediction process, let's remove the Rank and Year from the dataset, as it will not help our model in the slightest, given that Rank is just a sorted identifier, and videogames sell more in recent years than in the past. We could argue adding a Generation feature, but I think the platform feature does almost the same function that a Generation feature would have.

In [None]:
full <- games[!(names(games) %in% c("Year","Rank") )]
full$Franchise <- as.factor(full$Franchise)
head(full)

In [None]:
str(full)

In [None]:
head(full)

**Prediction**

Since we want to predict a continous variable ( that being Global_Sales) , let's Regression to try to predict it. First, we have to split the dataset in two, the training and the test. For that we have to randomly sample the dataset first, since it's sorted. Since we are only using categorival variables, let's remove  the Publisher factor (since it has too many factors) and ease up on the Franchise variables by splitting it into two, the top 41 franchises are Popular, 42 to 85 are Normal, 86 to 125 are unpopular, and None will still be None.

Update: Modifying platform feature to have less factors.

In [None]:
full$Platform <- as.character(full$Platform)
full$Platform[full$Platform %in% c("Wii","NES","GB","DS","SNES","GBA","3DS","N64","GC","WiiU")] <- "Nintendo"
full$Platform[full$Platform %in% c("PS","PS2","PS3","PS4","PSP","PSV")] <- "Sony"
full$Platform[full$Platform %in% c("XB","XOne","X360")] <- "Microsoft"
full$Platform[full$Platform %in% c("GG","DC","SAT","GEN")] <- "Sega"
full$Platform[!(full$Platform  %in% c("Nintendo","Sony","Microsoft","Sega"))] <- "Other"
full$Platform <- as.factor(full$Platform)
str(full)

Update:  Changing publisher feature by position in rankings.

In [None]:

#Best selling publishers ordered by sales.
BSP <- aggregate(games$Global_Sales, by=list(Publisher=games$Publisher), FUN=sum)
BSP <- BSP[order(-BSP$x),]
BSP$Publisher <- as.character(BSP$Publisher)
head(BSP)

In [None]:
#Changes
full$Publisher <- as.character(full$Publisher)
full$Publisher[full$Publisher %in% BSP$Publisher[BSP$x >=50.00]] <- "AAA"
full$Publisher[full$Publisher %in% BSP$Publisher[BSP$x >=5.00 & BSP$x <50.00]] <- "AA"
full$Publisher[full$Publisher %in% BSP$Publisher[BSP$x >=1.00 & BSP$x <5.00]] <- "A"
full$Publisher[full$Publisher %in% BSP$Publisher[BSP$x <1.00]] <- "N"

full$Publisher <- as.factor(full$Publisher)

In [None]:
head(full)

In [None]:
set.seed(17)
#Split franchises by sales, using BSF.
full$Franchise <- as.character(full$Franchise)
BSF$Franchise <- as.character(BSF$Franchise)
full$Franchise[full$Franchise %in%  BSF$Franchise[1:42]] <- "Popular" 
full$Franchise[full$Franchise %in%  BSF$Franchise[43:84]] <- "Normal"
full$Franchise[full$Franchise %in%  BSF$Franchise[85:127]] <- "Unpopular"

full$Franchise <- as.factor(full$Franchise)

full <- full[sample(nrow(full)),]
training <- full[1:8144,]
test <- full[8145:16286,]
validate <- test 
training<- unique(training)
test <- unique(test)

test$Global_Sales <- NA
test <- test[!(names(test) %in% c("Name"))]
str(training)
str(test)

Now to create the model.

In [None]:
#Elastic net Model
suppressWarnings(library(caret))
suppressWarnings(library(glmnet))

lambdagrid <- 10 ^ seq(2,-2,length = 100)
alphagrid <-  seq(0,1, length= 10 )

trnControl <- trainControl(
                method = "repeatedCV",
                number= 10,
                 repeats = 5)

srchGrd = expand.grid(.alpha = alphagrid, .lambda = lambdagrid)

formula <- Global_Sales ~ Publisher + Genre + Franchise 
model <- train(formula, data=training, method = 'glmnet', tuneGrid= srchGrd, trControl = trnControl,
              standardize=TRUE, maxit= 1000000 )
summary(model)

In [None]:
model2 <- lm(formula,data= training)

Now to use the model for prediction, and compare how close or how far from the actual values is it. 

In [None]:
model$bestTune
final <- model$finalModel

In [None]:
#Prediction
prediction <- predict(model, test, s= final$lambda.min)
summary(prediction)
str(prediction)
prediction2<- predict(model2,test)

And finally the comparison

In [None]:

Eval <- data.frame(Game= validate$Name, Actual = validate$Global_Sales)

prediction <- round(prediction,2)

#To be replaced
Eval <- Eval[1:length(prediction),]
Eval$Predicted <- abs(prediction)


Eval$diff <- abs(Eval$Predicted - Eval$Actual)
Eval
MSER <- sqrt(mean(Eval$diff^2))
MSER


Eval <- data.frame(Game= validate$Name, Actual = validate$Global_Sales)

prediction2 <- round(prediction2,2)

#To be replaced
Eval <- Eval[1:length(prediction2),]
Eval$Predicted <- abs(prediction2)


Eval$diff <- abs(Eval$Predicted - Eval$Actual)
Eval
MSER <- sqrt(mean(Eval$diff^2))
MSER

To be changed..