# 빅데이터분석 기말고사

- toc:flase
- branch: master
- badges: flase
- comments: flase 
- author: 최규빈

#### 통계학과 201618968 김종원

#### 환경 : colab

In [1]:
!pip install --upgrade fastai

Collecting fastai
  Downloading fastai-2.5.3-py3-none-any.whl (189 kB)
[K     |████████████████████████████████| 189 kB 5.2 MB/s 
Collecting fastdownload<2,>=0.0.5
  Downloading fastdownload-0.0.5-py3-none-any.whl (13 kB)
Collecting fastcore<1.4,>=1.3.22
  Downloading fastcore-1.3.27-py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 2.9 MB/s 
Installing collected packages: fastcore, fastdownload, fastai
  Attempting uninstall: fastai
    Found existing installation: fastai 1.0.61
    Uninstalling fastai-1.0.61:
      Successfully uninstalled fastai-1.0.61
Successfully installed fastai-2.5.3 fastcore-1.3.27 fastdownload-0.0.5


In [2]:
import numpy as np 
import pandas as pd 
import torch 
from fastai.collab import * 
from fastai.tabular.all import * 
from fastai.data.all import *
from fastai.vision.all import *

## `#1`. 체인룰과 역전파기법 

주어진 자료가 아래와 같다고 하자. 

- ${\bf X} = \begin{bmatrix} 1 & 2.1 \\ 1 & 3.0 \end{bmatrix}$

- ${\bf y} = \begin{bmatrix} 3.0 \\ 5.0 \end{bmatrix}$ 

손실함수의 정의가 아래와 같다고 하자. 

$$loss={\bf v}^\top {\bf v}$$

이때 ${\bf v}= {\bf y}-{\bf u}$ 이고 ${\bf u}= {\bf X}{\bf W}$ 이다. ${\bf W} =\begin{bmatrix} 0.5 \\ 0.6 \end{bmatrix}$ 지점에서의 $\frac{\partial}{\partial {\bf W}}loss$를 역전파 기법을 이용하여 구하고 파이토치의 backward()를 이용하여 검증하라. 즉 (1)-(6)을 계산하라. 

In [3]:
ones= torch.ones(2)
x = torch.tensor([2.1,3.0])
X = torch.vstack([ones,x]).T
y = torch.tensor([3.0,5.0])
W = torch.tensor([0.5,0.6],requires_grad=True) 

#### `(1)` 파이토치를 이용하여 순전파를 계산하라. 즉 ${\bf u}$를 계산하라. 

In [4]:
u = X @ W
u

tensor([1.7600, 2.3000], grad_fn=<MvBackward0>)

#### `(2)` 파이토치를 이용하여 오차를 계산하라. 즉 ${\bf v}$를 계산하라. 

In [5]:
v = y - u
v

tensor([1.2400, 2.7000], grad_fn=<SubBackward0>)

#### `(3)` 파이토치를 이용하여 오차제곱합을 계산하라. 즉 $loss={\bf v}^\top {\bf v}$를 계산하라. 

In [6]:
loss = v.T @ v
loss

tensor(8.8276, grad_fn=<DotBackward0>)

#### `(4)` $\frac{\partial}{\partial {\bf v}} loss$ 를 해석적으로 계산하라(=이론적인 값을 계산하라). 파이토치를 이용하여 검증하라. 

In [7]:
v

tensor([1.2400, 2.7000], grad_fn=<SubBackward0>)

In [8]:
A= torch.tensor([1.2400, 2.7000], requires_grad=True)
_loss = A @ A
_loss.backward()
print(A.grad.data)
print(2 * v)

tensor([2.4800, 5.4000])
tensor([2.4800, 5.4000], grad_fn=<MulBackward0>)


#### `(5)`$\frac{\partial }{\partial {\bf u}}{\bf v}^\top$와 $\frac{\partial }{\partial {\bf W}}{\bf u}^\top$의 값을 해석적으로 계산하라. (파이토치를 이용한 검증은 불필요) 

In [9]:
B = torch.zeros((2,2))

In [10]:
for i in range(2): 
    _u = torch.tensor([1.7600, -21.4080],requires_grad=True)
    _v = (y-_u)[i]
    _v.backward()
    B[:,i]= _u.grad.data

In [11]:
print(B)
print(-1 * np.eye(2))

tensor([[-1., -0.],
        [-0., -1.]])
[[-1. -0.]
 [-0. -1.]]


In [12]:
C = torch.zeros((2,2))

In [13]:
for i in range(2): 
    _W = torch.tensor([0.5, 0.6],requires_grad=True)
    _u = (X@_W)[i]
    _u.backward()
    C[:,i]= _W.grad.data

In [14]:
print(C)
print(X.T)

tensor([[1.0000, 1.0000],
        [2.1000, 3.0000]])
tensor([[1.0000, 1.0000],
        [2.1000, 3.0000]])


#### `(6)` (4)~(5)의 결과와 체인룰을 이용하여 $\frac{\partial}{\partial {\bf W}}loss$를 계산하라. 그리고 아래의 코드를 이용하여 검증하라. 

In [15]:
C @ B @ A.grad.data

tensor([ -7.8800, -21.4080])

```python
import torch
ones= torch.ones(2)
x = torch.tensor([2.1,3.0])
X = torch.vstack([ones,x]).T
y = torch.tensor([3.0,5.0])
W = torch.tensor([0.5,0.6],requires_grad=True) 
loss = (y-X@W).T @ (y-X@W)
loss.backward()
W.grad.data
```

In [16]:
ones= torch.ones(2)
x = torch.tensor([2.1,3.0])
X = torch.vstack([ones,x]).T
y = torch.tensor([3.0,5.0])
W = torch.tensor([0.5,0.6],requires_grad=True) 
loss = (y-X@W).T @ (y-X@W)
loss.backward()
W.grad.data

tensor([ -7.8800, -21.4080])

## `#2`. 음료추천 

아래는 200명의 사용자가 차가운커피, 따뜻한커피, 차가운홍차, 따듯한홍차 각 10종씩을 먹고 평점을 넣은 자료이다. 

In [17]:
df = pd.read_csv("https://raw.githubusercontent.com/guebin/2021BDA/master/_notebooks/2021-12-04-recommend.csv")
df

Unnamed: 0,user,item,rating,item_name
0,1,27,2.677878,차가운홍차7
1,1,28,2.382410,차가운홍차8
2,1,38,0.952034,따뜻한홍차8
3,1,21,2.359307,차가운홍차1
4,1,24,2.447412,차가운홍차4
...,...,...,...,...
3995,200,28,2.401077,차가운홍차8
3996,200,31,3.798483,따뜻한홍차1
3997,200,22,2.104705,차가운홍차2
3998,200,26,2.248165,차가운홍차6


#### `(1)` user-item matrix 생성하라. 

생성예시는 아래와 같다. 

In [18]:
#hide-input 
from IPython.display import HTML
HTML('<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>차가운커피1</th>\n      <th>차가운커피2</th>\n      <th>차가운커피3</th>\n      <th>차가운커피4</th>\n      <th>차가운커피5</th>\n      <th>차가운커피6</th>\n      <th>차가운커피7</th>\n      <th>차가운커피8</th>\n      <th>차가운커피9</th>\n      <th>차가운커피10</th>\n      <th>따듯한커피1</th>\n      <th>따듯한커피2</th>\n      <th>따듯한커피3</th>\n      <th>따듯한커피4</th>\n      <th>따듯한커피5</th>\n      <th>따듯한커피6</th>\n      <th>따듯한커피7</th>\n      <th>따듯한커피8</th>\n      <th>따듯한커피9</th>\n      <th>따듯한커피10</th>\n      <th>차가운홍차1</th>\n      <th>차가운홍차2</th>\n      <th>차가운홍차3</th>\n      <th>차가운홍차4</th>\n      <th>차가운홍차5</th>\n      <th>차가운홍차6</th>\n      <th>차가운홍차7</th>\n      <th>차가운홍차8</th>\n      <th>차가운홍차9</th>\n      <th>차가운홍차10</th>\n      <th>따뜻한홍차1</th>\n      <th>따뜻한홍차2</th>\n      <th>따뜻한홍차3</th>\n      <th>따뜻한홍차4</th>\n      <th>따뜻한홍차5</th>\n      <th>따뜻한홍차6</th>\n      <th>따뜻한홍차7</th>\n      <th>따뜻한홍차8</th>\n      <th>따뜻한홍차9</th>\n      <th>따뜻한홍차10</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>user1</th>\n      <td>None</td>\n      <td>3.937672</td>\n      <td>None</td>\n      <td>3.989888</td>\n      <td>4.133222</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>4.015579</td>\n      <td>2.103387</td>\n      <td>2.361724</td>\n      <td>None</td>\n      <td>2.273406</td>\n      <td>2.295347</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.791477</td>\n      <td>None</td>\n      <td>2.359307</td>\n      <td>2.565654</td>\n      <td>None</td>\n      <td>2.447412</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.677878</td>\n      <td>2.38241</td>\n      <td>2.194201</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>0.887225</td>\n      <td>1.014088</td>\n      <td>None</td>\n      <td>0.952034</td>\n      <td>0.658081</td>\n      <td>1.235058</td>\n    </tr>\n    <tr>\n      <th>user2</th>\n      <td>4.098147</td>\n      <td>4.094224</td>\n      <td>None</td>\n      <td>3.765555</td>\n      <td>None</td>\n      <td>None</td>\n      <td>3.988153</td>\n      <td>None</td>\n      <td>4.349755</td>\n      <td>3.640496</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.707521</td>\n      <td>2.765143</td>\n      <td>2.310812</td>\n      <td>2.458836</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.22282</td>\n      <td>2.621137</td>\n      <td>None</td>\n      <td>2.510424</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.788081</td>\n      <td>None</td>\n      <td>2.404252</td>\n      <td>2.908625</td>\n      <td>None</td>\n      <td>1.400812</td>\n      <td>None</td>\n      <td>0.654011</td>\n      <td>None</td>\n      <td>1.129268</td>\n      <td>None</td>\n      <td>None</td>\n      <td>0.703928</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>user3</th>\n      <td>3.819119</td>\n      <td>None</td>\n      <td>4.228748</td>\n      <td>3.79414</td>\n      <td>None</td>\n      <td>4.08909</td>\n      <td>3.776395</td>\n      <td>None</td>\n      <td>4.583121</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.7361</td>\n      <td>None</td>\n      <td>2.219188</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.791662</td>\n      <td>None</td>\n      <td>2.729578</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.494008</td>\n      <td>2.440778</td>\n      <td>0.695669</td>\n      <td>None</td>\n      <td>0.840201</td>\n      <td>0.960158</td>\n      <td>None</td>\n      <td>1.019722</td>\n      <td>1.287193</td>\n      <td>1.354343</td>\n      <td>1.237186</td>\n      <td>0.985125</td>\n    </tr>\n    <tr>\n      <th>user4</th>\n      <td>4.243031</td>\n      <td>3.985556</td>\n      <td>4.3557</td>\n      <td>4.200771</td>\n      <td>None</td>\n      <td>4.068798</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>4.149567</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.466804</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.104525</td>\n      <td>2.341672</td>\n      <td>2.463411</td>\n      <td>2.56218</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.37737</td>\n      <td>2.37356</td>\n      <td>None</td>\n      <td>2.317104</td>\n      <td>2.5877</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>1.014652</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>1.09685</td>\n      <td>0.664659</td>\n      <td>1.148056</td>\n      <td>1.302336</td>\n    </tr>\n    <tr>\n      <th>user5</th>\n      <td>3.855109</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>3.772252</td>\n      <td>4.18115</td>\n      <td>4.077935</td>\n      <td>None</td>\n      <td>3.905809</td>\n      <td>2.566041</td>\n      <td>2.412227</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.715758</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.651073</td>\n      <td>None</td>\n      <td>2.454781</td>\n      <td>2.654822</td>\n      <td>2.382804</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.599824</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>0.851721</td>\n      <td>1.313315</td>\n      <td>None</td>\n      <td>1.093123</td>\n      <td>None</td>\n      <td>0.759305</td>\n      <td>1.336896</td>\n      <td>None</td>\n      <td>0.742396</td>\n      <td>1.064772</td>\n    </tr>\n    <tr>\n      <th>...</th>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n      <td>...</td>\n    </tr>\n    <tr>\n      <th>user196</th>\n      <td>0.788662</td>\n      <td>0.704273</td>\n      <td>0.776555</td>\n      <td>0.8481</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>0.686273</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.164656</td>\n      <td>2.549222</td>\n      <td>2.614974</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.51912</td>\n      <td>2.355786</td>\n      <td>2.509917</td>\n      <td>2.382942</td>\n      <td>2.494133</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.457732</td>\n      <td>None</td>\n      <td>4.014754</td>\n      <td>4.184846</td>\n      <td>None</td>\n      <td>4.126758</td>\n      <td>None</td>\n      <td>None</td>\n      <td>4.364885</td>\n      <td>None</td>\n      <td>3.767153</td>\n      <td>4.405117</td>\n    </tr>\n    <tr>\n      <th>user197</th>\n      <td>1.303235</td>\n      <td>1.43626</td>\n      <td>1.00433</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>1.486788</td>\n      <td>1.295232</td>\n      <td>None</td>\n      <td>0.920782</td>\n      <td>2.511827</td>\n      <td>None</td>\n      <td>2.361798</td>\n      <td>None</td>\n      <td>2.354619</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.21937</td>\n      <td>2.401316</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.793289</td>\n      <td>None</td>\n      <td>2.464333</td>\n      <td>2.426258</td>\n      <td>4.253895</td>\n      <td>None</td>\n      <td>None</td>\n      <td>4.369466</td>\n      <td>None</td>\n      <td>3.996908</td>\n      <td>3.853673</td>\n      <td>None</td>\n      <td>3.917286</td>\n      <td>4.57724</td>\n    </tr>\n    <tr>\n      <th>user198</th>\n      <td>1.251698</td>\n      <td>None</td>\n      <td>1.017147</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>0.806444</td>\n      <td>None</td>\n      <td>2.520115</td>\n      <td>2.646957</td>\n      <td>None</td>\n      <td>2.952988</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.190244</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.282611</td>\n      <td>None</td>\n      <td>2.480411</td>\n      <td>2.663661</td>\n      <td>2.402259</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.708267</td>\n      <td>2.109672</td>\n      <td>2.824608</td>\n      <td>4.380199</td>\n      <td>4.022162</td>\n      <td>None</td>\n      <td>3.895619</td>\n      <td>None</td>\n      <td>3.887536</td>\n      <td>None</td>\n      <td>3.862879</td>\n      <td>None</td>\n      <td>4.261574</td>\n    </tr>\n    <tr>\n      <th>user199</th>\n      <td>1.007993</td>\n      <td>None</td>\n      <td>0.955789</td>\n      <td>None</td>\n      <td>0.846838</td>\n      <td>None</td>\n      <td>0.58893</td>\n      <td>1.046728</td>\n      <td>None</td>\n      <td>1.139212</td>\n      <td>2.739859</td>\n      <td>2.459454</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.430707</td>\n      <td>None</td>\n      <td>2.413188</td>\n      <td>2.608065</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.764538</td>\n      <td>2.389897</td>\n      <td>2.29379</td>\n      <td>None</td>\n      <td>2.428555</td>\n      <td>2.406729</td>\n      <td>2.507149</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>4.039527</td>\n      <td>None</td>\n      <td>None</td>\n      <td>3.837071</td>\n      <td>4.103043</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n    </tr>\n    <tr>\n      <th>user200</th>\n      <td>0.717826</td>\n      <td>None</td>\n      <td>1.23011</td>\n      <td>None</td>\n      <td>0.994098</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>1.14695</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.487716</td>\n      <td>2.56307</td>\n      <td>None</td>\n      <td>None</td>\n      <td>2.300041</td>\n      <td>2.552453</td>\n      <td>2.104705</td>\n      <td>2.862709</td>\n      <td>2.416833</td>\n      <td>None</td>\n      <td>2.248165</td>\n      <td>2.401267</td>\n      <td>2.401077</td>\n      <td>None</td>\n      <td>2.21877</td>\n      <td>3.798483</td>\n      <td>None</td>\n      <td>4.224537</td>\n      <td>None</td>\n      <td>None</td>\n      <td>4.117838</td>\n      <td>None</td>\n      <td>3.920277</td>\n      <td>4.00732</td>\n      <td>None</td>\n    </tr>\n  </tbody>\n</table>')

Unnamed: 0,차가운커피1,차가운커피2,차가운커피3,차가운커피4,차가운커피5,차가운커피6,차가운커피7,차가운커피8,차가운커피9,차가운커피10,따듯한커피1,따듯한커피2,따듯한커피3,따듯한커피4,따듯한커피5,따듯한커피6,따듯한커피7,따듯한커피8,따듯한커피9,따듯한커피10,차가운홍차1,차가운홍차2,차가운홍차3,차가운홍차4,차가운홍차5,차가운홍차6,차가운홍차7,차가운홍차8,차가운홍차9,차가운홍차10,따뜻한홍차1,따뜻한홍차2,따뜻한홍차3,따뜻한홍차4,따뜻한홍차5,따뜻한홍차6,따뜻한홍차7,따뜻한홍차8,따뜻한홍차9,따뜻한홍차10
user1,,3.937672,,3.989888,4.133222,,,,,4.015579,2.103387,2.361724,,2.273406,2.295347,,,,2.791477,,2.359307,2.565654,,2.447412,,,2.677878,2.38241,2.194201,,,,,,0.887225,1.014088,,0.952034,0.658081,1.235058
user2,4.098147,4.094224,,3.765555,,,3.988153,,4.349755,3.640496,,,2.707521,2.765143,2.310812,2.458836,,,,2.22282,2.621137,,2.510424,,,,2.788081,,2.404252,2.908625,,1.400812,,0.654011,,1.129268,,,0.703928,
user3,3.819119,,4.228748,3.79414,,4.08909,3.776395,,4.583121,,,2.7361,,2.219188,,,,,2.791662,,2.729578,,,,,,,,2.494008,2.440778,0.695669,,0.840201,0.960158,,1.019722,1.287193,1.354343,1.237186,0.985125
user4,4.243031,3.985556,4.3557,4.200771,,4.068798,,,,4.149567,,,2.466804,,,2.104525,2.341672,2.463411,2.56218,,,,2.37737,2.37356,,2.317104,2.5877,,,,1.014652,,,,,,1.09685,0.664659,1.148056,1.302336
user5,3.855109,,,,,3.772252,4.18115,4.077935,,3.905809,2.566041,2.412227,,,,2.715758,,,2.651073,,2.454781,2.654822,2.382804,,,,2.599824,,,,0.851721,1.313315,,1.093123,,0.759305,1.336896,,0.742396,1.064772
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
user196,0.788662,0.704273,0.776555,0.8481,,,,0.686273,,,2.164656,2.549222,2.614974,,,,,,2.51912,2.355786,2.509917,2.382942,2.494133,,,,,,2.457732,,4.014754,4.184846,,4.126758,,,4.364885,,3.767153,4.405117
user197,1.303235,1.43626,1.00433,,,,1.486788,1.295232,,0.920782,2.511827,,2.361798,,2.354619,,,,,2.21937,2.401316,,,,,,2.793289,,2.464333,2.426258,4.253895,,,4.369466,,3.996908,3.853673,,3.917286,4.57724
user198,1.251698,,1.017147,,,,,,,0.806444,,2.520115,2.646957,,2.952988,,,2.190244,,,2.282611,,2.480411,2.663661,2.402259,,,2.708267,2.109672,2.824608,4.380199,4.022162,,3.895619,,3.887536,,3.862879,,4.261574
user199,1.007993,,0.955789,,0.846838,,0.58893,1.046728,,1.139212,2.739859,2.459454,,,,2.430707,,2.413188,2.608065,,,2.764538,2.389897,2.29379,,2.428555,2.406729,2.507149,,,,4.039527,,,3.837071,4.103043,,,,


In [19]:
df2 = pd.DataFrame([[None]*40]*200,columns=['차가운커피'+str(i) for i in range(1,11)]+['따뜻한커피'+str(i) for i in range(1,11)]+['차가운홍차'+str(i) for i in range(1,11)]+['따뜻한홍차'+str(i) for i in range(1,11)])
df2.index = pd.Index(['user'+str(i) for i in range(1,201)])
df2

Unnamed: 0,차가운커피1,차가운커피2,차가운커피3,차가운커피4,차가운커피5,차가운커피6,차가운커피7,차가운커피8,차가운커피9,차가운커피10,따뜻한커피1,따뜻한커피2,따뜻한커피3,따뜻한커피4,따뜻한커피5,따뜻한커피6,따뜻한커피7,따뜻한커피8,따뜻한커피9,따뜻한커피10,차가운홍차1,차가운홍차2,차가운홍차3,차가운홍차4,차가운홍차5,차가운홍차6,차가운홍차7,차가운홍차8,차가운홍차9,차가운홍차10,따뜻한홍차1,따뜻한홍차2,따뜻한홍차3,따뜻한홍차4,따뜻한홍차5,따뜻한홍차6,따뜻한홍차7,따뜻한홍차8,따뜻한홍차9,따뜻한홍차10
user1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
user2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
user3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
user4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
user5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
user196,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
user197,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
user198,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
user199,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


#### `(2)` 첫번째 유저를 평점을 조회하고 이 유저의 취향을 서술하라. 커피와 홍차중 어떤음료를 선호하는가? 따듯한 음료와 차가운 음료중 어떤 음료를 선호하는가? 

In [20]:
for (i,j) in zip(df.user.to_list(), df.item.to_list()):
    df2.iloc[i-1,j-1]=df.query('user == @i and item == @j')['rating'].to_list()[0]

In [21]:
df2.loc['user1']

차가운커피1         None
차가운커피2      3.93767
차가운커피3         None
차가운커피4      3.98989
차가운커피5      4.13322
차가운커피6         None
차가운커피7         None
차가운커피8         None
차가운커피9         None
차가운커피10     4.01558
따뜻한커피1      2.10339
따뜻한커피2      2.36172
따뜻한커피3         None
따뜻한커피4      2.27341
따뜻한커피5      2.29535
따뜻한커피6         None
따뜻한커피7         None
따뜻한커피8         None
따뜻한커피9      2.79148
따뜻한커피10        None
차가운홍차1      2.35931
차가운홍차2      2.56565
차가운홍차3         None
차가운홍차4      2.44741
차가운홍차5         None
차가운홍차6         None
차가운홍차7      2.67788
차가운홍차8      2.38241
차가운홍차9       2.1942
차가운홍차10        None
따뜻한홍차1         None
따뜻한홍차2         None
따뜻한홍차3         None
따뜻한홍차4         None
따뜻한홍차5     0.887225
따뜻한홍차6      1.01409
따뜻한홍차7         None
따뜻한홍차8     0.952034
따뜻한홍차9     0.658081
따뜻한홍차10     1.23506
Name: user1, dtype: object

- 차가운 커피는 3`~`4점, 따뜻한 커피는 2`~`3점, 차가운 홍차는 2`~`3점, 따뜻한 홍차는 0`~`1점을 준 것으로 보입니다.

- 따라서 1번 유저는 홍차보다는 커피를, 따뜻한 음료보다는 차가운 음료를 선호합니다.

#### `(3)` fastai를 이용하여 추천모형을 학습하라. (`nn`을 사용하지 않아도 무방하다.) 

In [22]:
dls = CollabDataLoaders.from_df(df, bs=100) 

In [23]:
dls.items

Unnamed: 0,user,item,rating,item_name
2140,108,6,2.526742,차가운커피6
268,14,3,3.857859,차가운커피3
239,12,11,2.787438,따듯한커피1
866,44,21,2.789101,차가운홍차1
2012,101,2,2.657871,차가운커피2
...,...,...,...,...
1775,89,28,1.062476,차가운홍차8
1377,69,12,3.673949,따듯한커피2
1735,87,16,4.119558,따듯한커피6
1799,90,19,4.070353,따듯한커피9


In [24]:
lrnr = collab_learner(dls,n_factors=2,y_range=(0,5))
lrnr.fit(30,0.01)

epoch,train_loss,valid_loss,time
0,1.16963,1.191597,00:00
1,1.094352,1.018499,00:00
2,0.865296,0.568111,00:00
3,0.551166,0.194925,00:00
4,0.316783,0.085433,00:00
5,0.185383,0.060612,00:00
6,0.115576,0.054059,00:00
7,0.078808,0.051556,00:00
8,0.059324,0.049775,00:00
9,0.049162,0.048789,00:00


In [25]:
lrnr.show_results()

Unnamed: 0,user,item,rating,rating_pred
0,192.0,22.0,2.454438,2.316325
1,111.0,16.0,1.101632,1.06744
2,102.0,18.0,1.345894,1.126822
3,15.0,34.0,1.08434,0.98109
4,155.0,31.0,3.733439,3.809435
5,91.0,1.0,2.680644,2.45608
6,131.0,26.0,4.125908,3.89141
7,11.0,22.0,2.308875,2.61687
8,170.0,14.0,2.674447,2.31887


#### `(4)` (3)의 추천시스템을 이용하여 모든 음료(총40개)에 대하여 144번 유저의 `fitted rating` 을 구하라. 144번 유저는 어떤 취향인가? 

In [26]:
x144 = torch.tensor([[200,j] for j in range(1,41) ])

In [27]:
print(lrnr.model(x144.to("cuda:0"))[:10])
print(lrnr.model(x144.to("cuda:0"))[10:20])
print(lrnr.model(x144.to("cuda:0"))[20:30])
print(lrnr.model(x144.to("cuda:0"))[30:40])

tensor([0.9675, 0.9763, 1.0053, 0.9693, 0.9975, 1.0133, 0.9963, 1.0409, 0.9524,
        0.9597], device='cuda:0', grad_fn=<SliceBackward0>)
tensor([2.5536, 2.4887, 2.5252, 2.4530, 2.5330, 2.4838, 2.5185, 2.3947, 2.4543,
        2.3999], device='cuda:0', grad_fn=<SliceBackward0>)
tensor([2.3867, 2.2789, 2.4065, 2.3606, 2.3763, 2.3168, 2.3351, 2.4526, 2.3490,
        2.4030], device='cuda:0', grad_fn=<SliceBackward0>)
tensor([3.8897, 3.8909, 3.9336, 3.9357, 3.9637, 3.9896, 3.9630, 3.9191, 3.9420,
        4.0091], device='cuda:0', grad_fn=<SliceBackward0>)


- 커피보다는 홍차를, 차가운 음료보다는 따뜻한 음료를 선호합니다.

#### `(5)` 차가운커피1에 대한 모든유저(총200명)의 `fitted rating`을 구하라. 몇번부터 몇번까지의 유저가 차가운 커피를 선호하는가?

In [28]:
xall = torch.tensor([[i,1] for i in range(1,201) ])

In [29]:
for i in range(20):
  print( np.round( lrnr.model( xall.to("cuda:0"))[ i*10 : (i+1)*10 ].tolist() ) )

[4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[2. 2. 2. 2. 2. 3. 2. 2. 2. 2.]
[3. 2. 2. 3. 2. 2. 2. 2. 2. 2.]
[2. 2. 3. 3. 2. 2. 3. 2. 2. 2.]
[2. 2. 3. 3. 2. 2. 3. 2. 3. 3.]
[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[3. 3. 2. 2. 3. 3. 3. 3. 3. 3.]
[2. 3. 3. 2. 3. 2. 3. 2. 2. 2.]
[3. 3. 2. 2. 3. 3. 2. 2. 2. 3.]
[3. 2. 3. 2. 2. 2. 2. 2. 3. 3.]
[3. 3. 3. 2. 2. 2. 2. 3. 3. 2.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


- 1`~`50번 유저는 차가운커피1 을 아주 선호하고, 151`~`200번 유저는 차가운커피1 을 선호하지 않습니다. 그리고 나머지 유저들은 중간 정도로 선호합니다.

## `#3`. 영화추천 

아래의 코드를 이용하여 자료를 받고 `df`를 만든뒤 물음에 답하라. 

```python
path = untar_data(URLs.ML_100k) 
ratings=pd.read_csv(path/'u.data', delimiter='\t', header=None, names=['user','movie','rating','timestamp'])
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1', usecols=(0,1), names=('movie','title'), header=None)
df = ratings.merge(movies)
```

In [30]:
path = untar_data(URLs.ML_100k) 
ratings=pd.read_csv(path/'u.data', delimiter='\t', header=None, names=['user','movie','rating','timestamp'])
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1', usecols=(0,1), names=('movie','title'), header=None)
df = ratings.merge(movies)

In [31]:
df

Unnamed: 0,user,movie,rating,timestamp,title
0,196,242,3,881250949,Kolya (1996)
1,63,242,3,875747190,Kolya (1996)
2,226,242,5,883888671,Kolya (1996)
3,154,242,3,879138235,Kolya (1996)
4,306,242,5,876503793,Kolya (1996)
...,...,...,...,...,...
99995,840,1674,4,891211682,Mamma Roma (1962)
99996,655,1640,3,888474646,"Eighth Day, The (1996)"
99997,655,1637,3,888984255,Girls Town (1996)
99998,655,1630,3,887428735,"Silence of the Palace, The (Saimt el Qusur) (1994)"


#### `(1)` fastai를 이용하여 추천모형을 학습하라. (`nn`을 사용하지 않아도 무방하다.) 

In [32]:
dls = CollabDataLoaders.from_df(df,bs=64,item_name='title') 

In [33]:
lrnr3 = collab_learner(dls, n_factors=10, y_range=(0,5))
lrnr3.fit(10)

epoch,train_loss,valid_loss,time
0,1.121704,1.110029,00:10
1,0.909126,0.930696,00:10
2,0.893681,0.897511,00:11
3,0.8589,0.882952,00:10
4,0.843958,0.874156,00:10
5,0.863277,0.865314,00:10
6,0.820435,0.858363,00:10
7,0.820224,0.851192,00:10
8,0.824952,0.843012,00:10
9,0.78074,0.836791,00:10


#### `(2)` 아래의 영화들에 대한 30번유저의 `fitted rating`을 구하라.

```
1461    Terminator 2: Judgment Day (1991)
1462               Terminator, The (1984)
```

In [34]:
x1461 = torch.tensor([[30, 1461]])

In [35]:
lrnr3.model(x1461.to("cuda:0"))

tensor([4.3641], device='cuda:0', grad_fn=<AddBackward0>)

In [36]:
x1462 = torch.tensor([[30, 1462]])

In [37]:
lrnr3.model(x1462.to("cuda:0"))

tensor([4.3231], device='cuda:0', grad_fn=<AddBackward0>)

- 유저 30번의 fitted rating은 터미네이터2 4.3641, 터미네이터 4.3231 입니다.

## `#4`. 다음을 읽고 물음에 답하라. (O/X로 답할것) 

#### `(1)` 학습이 진행됨에 따라 training loss는 줄어들지만 validation loss가 증가하는 현상을 기울기소실문제라고 한다. 

- X

#### `(2)` 배치정규화는 기울기소실문제를 해결하는 방법 중 하나이다. 

- O

#### `(3)` 기울기소실은 얕은신경망보다 깊은신경망에서 자주 발생한다. 

- O

#### `(4)` 역전파기법은 과적합을 방지하는 테크닉중 하나이다. 

- X

#### `(5)` 순전파만 계산하고 싶을 경우 GPU메모리에 각 층의 계산결과를 저장할 필요가 없다. 

- O