# Desafio Kaggle

### Descriçao

---
1. O objetivo deste notebook é simular um desafio kaggle.
2. Utilizaremos o conjunto de dados direto do github, a qual iremos baixar e usar.

3. O problema consiste em prever agrupar e criar sistemas de recomendações a partir de avaliações de filmes
---

### Dicionário


Fields	                                                  | Type  	  |    Description                              |
----------------------------------------------------------|:---------:|:-------------------------------------------:|
Date 	  										  	  |string     | Data da alteração |
Open														  |float    | preço da abertura                        |
High		     										  |float     | preço mais alto no dia	               |
Low | float | preco mais baixo no dia
Close | float | preco de fechamento
Volume | float | Volume total do dia
  

# Instalação dos pacotes

In [None]:
# Instalação das bibliotecas necessárias
!pip install numpy pandas torch


# Documentação

1. **Pandas** -> [Link](https://pandas.pydata.org/docs/)
2. **Numpy** -> [Link](https://numpy.org/doc/)
3. **Scikit Learn** -> [Link](https://scikit-learn.org/stable/)
4. **Keras** -> [Link](https://keras.io/api/)
5. **TensorFlow** -> [Link](https://www.tensorflow.org/api_docs/python/tf/keras)
6. **PyTorch** -> [Link](https://pytorch.org/docs/stable/index.html)
7. **Kaggle** -> [Link](https://www.kaggle.com/c/predict-movie-ratings/overview)

# Obtendo o dataset

In [None]:
!git clone https://github.com/batestin1/coding_the_future_dio_redes_neurais.git #clona o repositorio
!mv coding_the_future_dio_redes_neurais/dataset /content/ #move apenas a pasta dataset para fora do diretorio
!rm -rf coding_the_future_dio_redes_neurais #exclui o restante que nao nos interessa



Cloning into 'coding_the_future_dio_redes_neurais'...
remote: Enumerating objects: 10986, done.[K
remote: Counting objects: 100% (54/54), done.[K
remote: Compressing objects: 100% (41/41), done.[K
remote: Total 10986 (delta 24), reused 38 (delta 13), pack-reused 10932[K
Receiving objects: 100% (10986/10986), 191.19 MiB | 27.33 MiB/s, done.
Resolving deltas: 100% (47/47), done.
Updating files: 100% (11021/11021), done.


# Instalando as bibliotecas




In [None]:

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable



# Obtendo dados

In [None]:

#obtendo dados
training_set = pd.read_csv('/content/dataset/kaggle/train_v2.csv')
training_set = np.array(training_set, dtype = 'int')
test_set = pd.read_csv('/content/dataset/kaggle/test_v2.csv')
test_set = np.array(test_set, dtype = 'int')


# Preparando o notebook

In [None]:

#obtendo maiores valores
nb_users = int(max(max(training_set[:,0]), max(test_set[:,0])))
nb_movies = int(max(max(training_set[:,1]), max(test_set[:,1])))

#convertendo dados para formato de avaliações
def convert(data):
    new_data = []
    for id_users in range(1, nb_users + 1):
        id_movies = data[:,1][data[:,0] == id_users]
        id_ratings = data[:,2][data[:,0] == id_users]
        ratings = np.zeros(nb_movies)
        ratings[id_movies - 1] = id_ratings
        new_data.append(list(ratings))
    return new_data
training_set = convert(training_set)
test_set = convert(test_set)

#criando tensors do torch
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

#criando classe de treino
class SAE(nn.Module):
    def __init__(self, ):
        super(SAE, self).__init__()
        self.fc1 = nn.Linear(nb_movies, 20)
        self.fc2 = nn.Linear(20, 10)
        self.fc3 = nn.Linear(10, 20)
        self.fc4 = nn.Linear(20, nb_movies)
        self.activation = nn.Sigmoid()
    def forward(self, x):
        x = self.activation(self.fc1(x))
        x = self.activation(self.fc2(x))
        x = self.activation(self.fc3(x))
        x = self.fc4(x)
        return x

#Instânciando a classe
sae = SAE()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(sae.parameters(), lr = 1e-2, weight_decay = 0.5)

#Treinando
nb_epoch = 200
for epoch in range(1, nb_epoch + 1):
    train_loss = 0
    s = 0.
    for id_user in range(nb_users):
        input = Variable(training_set[id_user]).unsqueeze(0)
        target = input.clone()
        if torch.sum(target.data > 0) > 0:
            output = sae(input)
            target.require_grad = False
            output[target == 0] = 0
            loss = criterion(output, target)
            mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
            loss.backward()
            train_loss += np.sqrt(loss.data*mean_corrector)
            s += 1.
            optimizer.step()
    print('epoch: '+str(epoch)+' loss: '+str(train_loss/s))



# Testando


In [None]:
#testando
test_loss = 0
s = 0.
for id_user in range(nb_users):
    input = Variable(training_set[id_user]).unsqueeze(0)
    target = Variable(test_set[id_user]).unsqueeze(0)
    if torch.sum(target.data > 0) > 0:
        output = sae(input)
        target.require_grad = False
        output[target == 0] = 0
        loss = criterion(output, target)
        mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
        test_loss += np.sqrt(loss.data*mean_corrector)
        s += 1.
print('Loss de teste: '+str(test_loss/s))