In [1]:
#导入graphlab
import graphlab as gl
# set canvas to show sframes and sgraphs in ipython notebook
gl.canvas.set_target('ipynb')
In [2]:
#读取txt文件,整理成用户、音乐、评分三列
train_file = '/Users/GloriaWu/Desktop/Datasets/cjc/millionsong/song_usage_10000.txt'
sf = gl.SFrame.read_csv(train_file, header=False, delimiter='\t', verbose=False)
sf.rename({'X1':'user_id', 'X2':'music_id', 'X3':'rating'})
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1530655198.log
This non-commercial license of GraphLab Create for academic use is assigned to 17210130103@fudan.edu.cn and will expire on April 27, 2019.
Out[2]:
user_id music_id rating
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOAKIMP12A8C130995 1
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBBMDR12A8C13253B 2
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBXHDL12A81C204C0 1
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOBYHAJ12A6701BF1D 1
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODACBL12A8C13C273 1
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODDNQT12A6D4F5F7E 5
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SODXRTY12AB0180F3B 1
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFGUAY12AB017B0A8 1
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOFRQTD12A81C233C0 1
b80344d063b5ccb3212f76538
f3d9e43d87dca9e ...
SOHQWYZ12A6D4FA701 1
[2000000 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [3]:
train_set, test_set = sf.random_split(0.8, seed=1)
In [4]:
popularity_model = gl.popularity_recommender.create(train_set, 
                                                    'user_id', 'music_id', 
                                                    target = 'rating')
Recsys training: model = popularity
Preparing data set.
    Data has 1599753 observations with 76085 users and 10000 items.
    Data prepared in: 2.77416s
1599753 observations to process; with 10000 unique items.
In [5]:
item_sim_model = gl.item_similarity_recommender.create(train_set, 
                                                       'user_id', 'music_id', 
                                                       target = 'rating', 
                                                       similarity_type='cosine')
Recsys training: model = item_similarity
Preparing data set.
    Data has 1599753 observations with 76085 users and 10000 items.
    Data prepared in: 2.47708s
Training model from provided data.
Gathering per-item and per-user statistics.
+--------------------------------+------------+
| Elapsed Time (Item Statistics) | % Complete |
+--------------------------------+------------+
| 5.189ms                        | 2.5        |
| 215.427ms                      | 100        |
+--------------------------------+------------+
Setting up lookup tables.
Processing data in one pass using dense lookup tables.
+-------------------------------------+------------------+-----------------+
| Elapsed Time (Constructing Lookups) | Total % Complete | Items Processed |
+-------------------------------------+------------------+-----------------+
| 713.516ms                           | 0                | 0               |
| 1.71s                               | 60.25            | 6027            |
| 3.60s                               | 100              | 10000           |
+-------------------------------------+------------------+-----------------+
Finalizing lookup tables.
Generating candidate set for working with new users.
Finished training in 4.74198s
In [6]:
factorization_machine_model = gl.recommender.factorization_recommender.create(train_set, 
                                                                              'user_id', 'music_id',
                                                                              target='rating')
Recsys training: model = factorization_recommender
Preparing data set.
    Data has 1599753 observations with 76085 users and 10000 items.
    Data prepared in: 2.47306s
Training factorization_recommender for recommendations.
+--------------------------------+--------------------------------------------------+----------+
| Parameter                      | Description                                      | Value    |
+--------------------------------+--------------------------------------------------+----------+
| num_factors                    | Factor Dimension                                 | 8        |
| regularization                 | L2 Regularization on Factors                     | 1e-08    |
| solver                         | Solver used for training                         | sgd      |
| linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-10    |
| max_iterations                 | Maximum Number of Iterations                     | 50       |
+--------------------------------+--------------------------------------------------+----------+
  Optimizing model using SGD; tuning step size.
  Using 199969 / 1599753 points for tuning the step size.
+---------+-------------------+------------------------------------------+
| Attempt | Initial Step Size | Estimated Objective Value                |
+---------+-------------------+------------------------------------------+
| 0       | 25                | No Decrease (228.449 >= 39.1206)         |
| 1       | 6.25              | No Decrease (217.357 >= 39.1206)         |
| 2       | 1.5625            | No Decrease (185.475 >= 39.1206)         |
| 3       | 0.390625          | No Decrease (85.7315 >= 39.1206)         |
| 4       | 0.0976562         | 14.0922                                  |
| 5       | 0.0488281         | 10.3179                                  |
| 6       | 0.0244141         | 21.3508                                  |
+---------+-------------------+------------------------------------------+
| Final   | 0.0488281         | 10.3179                                  |
+---------+-------------------+------------------------------------------+
Starting Optimization.
+---------+--------------+-------------------+-----------------------+-------------+
| Iter.   | Elapsed Time | Approx. Objective | Approx. Training RMSE | Step Size   |
+---------+--------------+-------------------+-----------------------+-------------+
| Initial | 111us        | 43.795            | 6.61778               |             |
+---------+--------------+-------------------+-----------------------+-------------+
| 1       | 290.281ms    | 43.4288           | 6.58966               | 0.0488281   |
| 2       | 636.953ms    | 40.8134           | 6.38818               | 0.0290334   |
| 3       | 1.26s        | 37.74             | 6.14292               | 0.0214205   |
| 4       | 1.50s        | 35.2598           | 5.93761               | 0.0172633   |
| 5       | 1.79s        | 32.7702           | 5.72409               | 0.014603    |
| 6       | 2.08s        | 30.6898           | 5.53935               | 0.0127367   |
| 10      | 3.39s        | 24.6255           | 4.96176               | 0.008683    |
| 11      | 3.66s        | 23.5026           | 4.84726               | 0.00808399  |
| 15      | 4.60s        | 20.2983           | 4.50455               | 0.00640622  |
| 20      | 5.80s        | 17.7012           | 4.20635               | 0.00516295  |
| 25      | 7.00s        | 15.7711           | 3.97025               | 0.00436732  |
| 30      | 8.16s        | 14.4013           | 3.79378               | 0.00380916  |
| 35      | 9.45s        | 13.4676           | 3.66862               | 0.00339327  |
| 40      | 10.91s       | 12.5937           | 3.54749               | 0.00306991  |
| 45      | 12.13s       | 11.8875           | 3.44648               | 0.00281035  |
| 50      | 13.51s       | 11.1907           | 3.34384               | 0.00259682  |
+---------+--------------+-------------------+-----------------------+-------------+
Optimization Complete: Maximum number of passes through the data reached.
Computing final objective value and training RMSE.
       Final objective value: 10.2257
       Final training RMSE: 3.19628
In [7]:
len(train_set)
Out[7]:
1599753
In [8]:
result = gl.recommender.util.compare_models(test_set, 
                                            [popularity_model, item_sim_model, factorization_machine_model],
                                            user_sample=.5, skip_set=train_set)
compare_models: using 34355 users to estimate model performance
PROGRESS: Evaluate model M0
recommendations finished on 1000/34355 queries. users per second: 4222.51
recommendations finished on 2000/34355 queries. users per second: 4160.4
recommendations finished on 3000/34355 queries. users per second: 4144.48
recommendations finished on 4000/34355 queries. users per second: 4143.79
recommendations finished on 5000/34355 queries. users per second: 4134.09
recommendations finished on 6000/34355 queries. users per second: 4028.85
recommendations finished on 7000/34355 queries. users per second: 4026.89
recommendations finished on 8000/34355 queries. users per second: 3941.28
recommendations finished on 9000/34355 queries. users per second: 3940.19
recommendations finished on 10000/34355 queries. users per second: 3832.19
recommendations finished on 11000/34355 queries. users per second: 3736.98
recommendations finished on 12000/34355 queries. users per second: 3742.87
recommendations finished on 13000/34355 queries. users per second: 3697.09
recommendations finished on 14000/34355 queries. users per second: 3685.12
recommendations finished on 15000/34355 queries. users per second: 3708.32
recommendations finished on 16000/34355 queries. users per second: 3709.59
recommendations finished on 17000/34355 queries. users per second: 3721.08
recommendations finished on 18000/34355 queries. users per second: 3737.42
recommendations finished on 19000/34355 queries. users per second: 3746.75
recommendations finished on 20000/34355 queries. users per second: 3765.42
recommendations finished on 21000/34355 queries. users per second: 3732.14
recommendations finished on 22000/34355 queries. users per second: 3697.33
recommendations finished on 23000/34355 queries. users per second: 3678.93
recommendations finished on 24000/34355 queries. users per second: 3609.73
recommendations finished on 25000/34355 queries. users per second: 3542.54
recommendations finished on 26000/34355 queries. users per second: 3501.71
recommendations finished on 27000/34355 queries. users per second: 3474.57
recommendations finished on 28000/34355 queries. users per second: 3457.23
recommendations finished on 29000/34355 queries. users per second: 3468.36
recommendations finished on 30000/34355 queries. users per second: 3468.86
recommendations finished on 31000/34355 queries. users per second: 3481.79
recommendations finished on 32000/34355 queries. users per second: 3487.89
recommendations finished on 33000/34355 queries. users per second: 3417.42
recommendations finished on 34000/34355 queries. users per second: 3417.26
Precision and recall summary statistics by cutoff
+--------+-------------------+-------------------+
| cutoff |   mean_precision  |    mean_recall    |
+--------+-------------------+-------------------+
|   1    | 0.000523941202154 | 5.17000660325e-05 |
|   2    | 0.000509387279872 | 0.000170585099777 |
|   3    |  0.00047542812788 |  0.00029829474862 |
|   4    | 0.000531218163295 | 0.000475281733248 |
|   5    | 0.000576335322369 | 0.000658209948486 |
|   6    | 0.000528792509581 | 0.000715155057097 |
|   7    | 0.000503149884608 | 0.000798656348574 |
|   8    | 0.000480279435308 | 0.000842311610916 |
|   9    | 0.000562751661573 |  0.00115797092884 |
|   10   | 0.000523941202154 |  0.00121012248368 |
+--------+-------------------+-------------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 6.5963610596615725)

Per User RMSE (best)
+-------------------------------+-------+------+
|            user_id            | count | rmse |
+-------------------------------+-------+------+
| c1fe152a39495e06fbe5b11523... |   1   | 0.0  |
+-------------------------------+-------+------+
[1 rows x 3 columns]


Per User RMSE (worst)
+-------------------------------+-------+---------------+
|            user_id            | count |      rmse     |
+-------------------------------+-------+---------------+
| 50996bbabb6f7857bf0c801943... |   2   | 647.013311924 |
+-------------------------------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+--------------------+-------+-----------------+
|      music_id      | count |       rmse      |
+--------------------+-------+-----------------+
| SOXDPFW12A81C2319B |   7   | 0.0735294117647 |
+--------------------+-------+-----------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+--------------------+-------+---------------+
|      music_id      | count |      rmse     |
+--------------------+-------+---------------+
| SOUAGPQ12A8AE47B3A |   5   | 409.214387758 |
+--------------------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M1
recommendations finished on 1000/34355 queries. users per second: 3849.78
recommendations finished on 2000/34355 queries. users per second: 3329.13
recommendations finished on 3000/34355 queries. users per second: 3180.68
recommendations finished on 4000/34355 queries. users per second: 3083.73
recommendations finished on 5000/34355 queries. users per second: 3043.43
recommendations finished on 6000/34355 queries. users per second: 3062.24
recommendations finished on 7000/34355 queries. users per second: 3062.76
recommendations finished on 8000/34355 queries. users per second: 3004.64
recommendations finished on 9000/34355 queries. users per second: 2974.7
recommendations finished on 10000/34355 queries. users per second: 2948.12
recommendations finished on 11000/34355 queries. users per second: 2892.01
recommendations finished on 12000/34355 queries. users per second: 2880.17
recommendations finished on 13000/34355 queries. users per second: 2889.48
recommendations finished on 14000/34355 queries. users per second: 2806.97
recommendations finished on 15000/34355 queries. users per second: 2744.21
recommendations finished on 16000/34355 queries. users per second: 2787.05
recommendations finished on 17000/34355 queries. users per second: 2838.4
recommendations finished on 18000/34355 queries. users per second: 2875.55
recommendations finished on 19000/34355 queries. users per second: 2919.67
recommendations finished on 20000/34355 queries. users per second: 2952.86
recommendations finished on 21000/34355 queries. users per second: 2910.06
recommendations finished on 22000/34355 queries. users per second: 2911.49
recommendations finished on 23000/34355 queries. users per second: 2928.74
recommendations finished on 24000/34355 queries. users per second: 2936.68
recommendations finished on 25000/34355 queries. users per second: 2954.18
recommendations finished on 26000/34355 queries. users per second: 2981.67
recommendations finished on 27000/34355 queries. users per second: 2998.93
recommendations finished on 28000/34355 queries. users per second: 3023.92
recommendations finished on 29000/34355 queries. users per second: 3042.91
recommendations finished on 30000/34355 queries. users per second: 3056.01
recommendations finished on 31000/34355 queries. users per second: 3070.1
recommendations finished on 32000/34355 queries. users per second: 3073.17
recommendations finished on 33000/34355 queries. users per second: 3085.4
recommendations finished on 34000/34355 queries. users per second: 3022.81
Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    | 0.0515208848785 | 0.0152207718006 |
|   2    | 0.0626837432688 | 0.0334231764526 |
|   3    | 0.0736525493621 | 0.0540499584341 |
|   4    |  0.076007859118 | 0.0700283271584 |
|   5    | 0.0754126036967 |  0.084787121129 |
|   6    | 0.0739339251928 | 0.0972042088108 |
|   7    | 0.0713350104996 |  0.107113217467 |
|   8    | 0.0690292533838 |  0.116522295784 |
|   9    | 0.0666763692815 |  0.125131900634 |
|   10   | 0.0644214815893 |  0.132986067009 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 7.267859581870015)

Per User RMSE (best)
+-------------------------------+-------+-------------------+
|            user_id            | count |        rmse       |
+-------------------------------+-------+-------------------+
| dad5cd4678a6f6df34932432bc... |   1   | 0.000917145184108 |
+-------------------------------+-------+-------------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+-------------------------------+-------+---------------+
|            user_id            | count |      rmse     |
+-------------------------------+-------+---------------+
| 50996bbabb6f7857bf0c801943... |   2   | 650.121367005 |
+-------------------------------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+--------------------+-------+----------------+
|      music_id      | count |      rmse      |
+--------------------+-------+----------------+
| SOYSRGJ12A6D4FAC8B |   7   | 0.710705125665 |
+--------------------+-------+----------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+--------------------+-------+---------------+
|      music_id      | count |      rmse     |
+--------------------+-------+---------------+
| SOUAGPQ12A8AE47B3A |   5   | 411.184725496 |
+--------------------+-------+---------------+
[1 rows x 3 columns]

PROGRESS: Evaluate model M2
recommendations finished on 1000/34355 queries. users per second: 3130.4
recommendations finished on 2000/34355 queries. users per second: 3100.82
recommendations finished on 3000/34355 queries. users per second: 3132.68
recommendations finished on 4000/34355 queries. users per second: 3022.5
recommendations finished on 5000/34355 queries. users per second: 2907.65
recommendations finished on 6000/34355 queries. users per second: 2864.32
recommendations finished on 7000/34355 queries. users per second: 2858.57
recommendations finished on 8000/34355 queries. users per second: 2788.73
recommendations finished on 9000/34355 queries. users per second: 2754.01
recommendations finished on 10000/34355 queries. users per second: 2763.44
recommendations finished on 11000/34355 queries. users per second: 2776.38
recommendations finished on 12000/34355 queries. users per second: 2794.03
recommendations finished on 13000/34355 queries. users per second: 2816.17
recommendations finished on 14000/34355 queries. users per second: 2830.78
recommendations finished on 15000/34355 queries. users per second: 2847.27
recommendations finished on 16000/34355 queries. users per second: 2848.34
recommendations finished on 17000/34355 queries. users per second: 2868.21
recommendations finished on 18000/34355 queries. users per second: 2878.62
recommendations finished on 19000/34355 queries. users per second: 2890.56
recommendations finished on 20000/34355 queries. users per second: 2898.55
recommendations finished on 21000/34355 queries. users per second: 2897.12
recommendations finished on 22000/34355 queries. users per second: 2891.27
recommendations finished on 23000/34355 queries. users per second: 2877.18
recommendations finished on 24000/34355 queries. users per second: 2869.05
recommendations finished on 25000/34355 queries. users per second: 2874.29
recommendations finished on 26000/34355 queries. users per second: 2866.01
recommendations finished on 27000/34355 queries. users per second: 2862.39
recommendations finished on 28000/34355 queries. users per second: 2855.06
recommendations finished on 29000/34355 queries. users per second: 2856.8
recommendations finished on 30000/34355 queries. users per second: 2860.26
recommendations finished on 31000/34355 queries. users per second: 2860.85
recommendations finished on 32000/34355 queries. users per second: 2860.96
recommendations finished on 33000/34355 queries. users per second: 2858.44
recommendations finished on 34000/34355 queries. users per second: 2862.18
Precision and recall summary statistics by cutoff
+--------+-------------------+-------------------+
| cutoff |   mean_precision  |    mean_recall    |
+--------+-------------------+-------------------+
|   1    | 0.000553049046718 | 0.000104862301006 |
|   2    | 0.000436617668462 | 0.000150208697684 |
|   3    | 0.000456022898171 | 0.000244280304175 |
|   4    | 0.000458448551885 | 0.000362030566009 |
|   5    | 0.000500654926503 | 0.000496565564311 |
|   6    | 0.000509387279872 | 0.000611094195067 |
|   7    | 0.000528099465663 | 0.000763691037949 |
|   8    | 0.000534856643866 |  0.00091093334754 |
|   9    | 0.000559517456621 |  0.00105118785328 |
|   10   | 0.000585067675739 |  0.00123761755428 |
+--------+-------------------+-------------------+
[10 rows x 3 columns]

('\nOverall RMSE: ', 8.23157007329957)

Per User RMSE (best)
+-------------------------------+-------+-------------------+
|            user_id            | count |        rmse       |
+-------------------------------+-------+-------------------+
| b149fc936a8ac2b1a2101f0eb1... |   1   | 0.000117252534705 |
+-------------------------------+-------+-------------------+
[1 rows x 3 columns]


Per User RMSE (worst)
+-------------------------------+-------+---------------+
|            user_id            | count |      rmse     |
+-------------------------------+-------+---------------+
| 50996bbabb6f7857bf0c801943... |   2   | 662.536814472 |
+-------------------------------+-------+---------------+
[1 rows x 3 columns]


Per Item RMSE (best)
+--------------------+-------+-----------------+
|      music_id      | count |       rmse      |
+--------------------+-------+-----------------+
| SOGIWKD12AB018640B |   1   | 0.0336472643078 |
+--------------------+-------+-----------------+
[1 rows x 3 columns]


Per Item RMSE (worst)
+--------------------+-------+---------------+
|      music_id      | count |      rmse     |
+--------------------+-------+---------------+
| SOUAGPQ12A8AE47B3A |   5   | 418.913181081 |
+--------------------+-------+---------------+
[1 rows x 3 columns]

In [9]:
K = 10
users = gl.SArray(sf['user_id'].unique().head(100))
In [10]:
recs = item_sim_model.recommend(users=users, k=K)
recs.head()
Out[10]:
user_id music_id score rank
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOXUQNR12AF72A69D6 0.302626844715 1
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOFISNS12A67ADE5FF 0.129972689292 2
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOGXSWA12A6D4FBC99 0.126114996041 3
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOHOTTD12A6D4F7035 0.115846942453 4
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SODZBJH12AF72A9CF7 0.111501108198 5
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SONYKOW12AB01849C9 0.104462311548 6
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOLFXKT12AB017E3E0 0.104228854179 7
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOAXGDH12A8C13F8A1 0.094238214633 8
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOPDIDL12A58A7ABF0 0.093481474063 9
c66c10a9567f0d82ff31441a9
fd5063e5cd9dfe8 ...
SOTEZXJ12A8C1365AA 0.0933470936383 10
[10 rows x 4 columns]