# Brewing Recommendations: A Data-Driven Approach to Coffee Recommendations Using Linear Algebra 

Angelica Tellez and Prof. Johann Thiel

New York City College of Technology, Emerging Scholars Program

## Abstract 

This project explores the application of linear algebra to develop a personalized coffee recommendation system based on individual preferences. We created a mathematical model that uses these preferences to recommend five coffee beans tailored to specific users. Through this research, we demonstrate how linear algebra concepts (such as the dot product, vector normalization) can inform everyday choices, down to the specific coffee we enjoy. In the future, we aim to gather real user data to refine the model, ideally generating accurate recommendations with even fewer inputs.

## Setup

Importing the necessary libraries for the project

In [1]:
import pandas as pd
import numpy as np

Reading the data

In [2]:
coffee_ratings = pd.read_csv("coffee_ratings.csv")

Looking at the data

In [3]:
coffee_ratings.head()

Unnamed: 0,total_cup_points,species,owner,country_of_origin,farm_name,lot_number,mill,ico_number,company,altitude,...,color,category_two_defects,expiration,certification_body,certification_address,certification_contact,unit_of_measurement,altitude_low_meters,altitude_high_meters,altitude_mean_meters
0,90.58,Arabica,metad plc,Ethiopia,metad plc,,metad plc,2014/2015,metad agricultural developmet plc,1950-2200,...,Green,0,"April 3rd, 2016",METAD Agricultural Development plc,309fcf77415a3661ae83e027f7e5f05dad786e44,19fef5a731de2db57d16da10287413f5f99bc2dd,m,1950.0,2200.0,2075.0
1,89.92,Arabica,metad plc,Ethiopia,metad plc,,metad plc,2014/2015,metad agricultural developmet plc,1950-2200,...,Green,1,"April 3rd, 2016",METAD Agricultural Development plc,309fcf77415a3661ae83e027f7e5f05dad786e44,19fef5a731de2db57d16da10287413f5f99bc2dd,m,1950.0,2200.0,2075.0
2,89.75,Arabica,grounds for health admin,Guatemala,"san marcos barrancas ""san cristobal cuch",,,,,1600 - 1800 m,...,,0,"May 31st, 2011",Specialty Coffee Association,36d0d00a3724338ba7937c52a378d085f2172daa,0878a7d4b9d35ddbf0fe2ce69a2062cceb45a660,m,1600.0,1800.0,1700.0
3,89.0,Arabica,yidnekachew dabessa,Ethiopia,yidnekachew dabessa coffee plantation,,wolensu,,yidnekachew debessa coffee plantation,1800-2200,...,Green,2,"March 25th, 2016",METAD Agricultural Development plc,309fcf77415a3661ae83e027f7e5f05dad786e44,19fef5a731de2db57d16da10287413f5f99bc2dd,m,1800.0,2200.0,2000.0
4,88.83,Arabica,metad plc,Ethiopia,metad plc,,metad plc,2014/2015,metad agricultural developmet plc,1950-2200,...,Green,2,"April 3rd, 2016",METAD Agricultural Development plc,309fcf77415a3661ae83e027f7e5f05dad786e44,19fef5a731de2db57d16da10287413f5f99bc2dd,m,1950.0,2200.0,2075.0


Visualizing the statistics of the data

In [4]:
coffee_ratings.describe()

Unnamed: 0,total_cup_points,number_of_bags,aroma,flavor,aftertaste,acidity,body,balance,uniformity,clean_cup,sweetness,cupper_points,moisture,category_one_defects,quakers,category_two_defects,altitude_low_meters,altitude_high_meters,altitude_mean_meters
count,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1339.0,1338.0,1339.0,1109.0,1109.0,1109.0
mean,82.089851,154.182972,7.566706,7.520426,7.401083,7.535706,7.517498,7.518013,9.834877,9.835108,9.856692,7.503376,0.088379,0.479462,0.173393,3.556385,1750.713315,1799.347775,1775.030545
std,3.500575,129.987162,0.37756,0.398442,0.404463,0.379827,0.370064,0.408943,0.554591,0.763946,0.616102,0.473464,0.048287,2.549683,0.832121,5.312541,8669.440545,8668.805771,8668.62608
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0
25%,81.08,14.0,7.42,7.33,7.25,7.33,7.33,7.33,10.0,10.0,10.0,7.25,0.09,0.0,0.0,0.0,1100.0,1100.0,1100.0
50%,82.5,175.0,7.58,7.58,7.42,7.58,7.5,7.5,10.0,10.0,10.0,7.5,0.11,0.0,0.0,2.0,1310.64,1350.0,1310.64
75%,83.67,275.0,7.75,7.75,7.58,7.75,7.67,7.75,10.0,10.0,10.0,7.75,0.12,0.0,0.0,4.0,1600.0,1650.0,1600.0
max,90.58,1062.0,8.75,8.83,8.67,8.75,8.58,8.75,10.0,10.0,10.0,10.0,0.28,63.0,11.0,55.0,190164.0,190164.0,190164.0


## Data Cleaning 

Selecting the features that impact coffee flavor 

In [5]:
features = ["flavor", "aroma", "aftertaste", "acidity", "body"]

Creating a new data frame that describes the selected features of each coffee bean

In [6]:
items_df = coffee_ratings[features] 

Removing null values from the data

In [7]:
items_df.count() 
## no need to use dropna since there are no missing values

flavor        1339
aroma         1339
aftertaste    1339
acidity       1339
body          1339
dtype: int64

In [8]:
items_df ## Cleaned Data

Unnamed: 0,flavor,aroma,aftertaste,acidity,body
0,8.83,8.67,8.67,8.75,8.50
1,8.67,8.75,8.50,8.58,8.42
2,8.50,8.42,8.42,8.42,8.33
3,8.58,8.17,8.42,8.42,8.50
4,8.50,8.25,8.25,8.50,8.42
...,...,...,...,...,...
1334,7.58,7.75,7.33,7.58,5.08
1335,7.67,7.50,7.75,7.75,5.17
1336,7.33,7.33,7.17,7.42,7.50
1337,6.83,7.42,6.75,7.17,7.25


## Data Simulation

Building user-feature preference vectors for each feature by simulating random normal values derived from the items data frame for 6 "simulated" users

In [9]:
random_flavor = np.random.normal(7.520426, 0.398442, size=(6, 1)) 
random_aroma = np.random.normal(7.566706, 0.377560, size=(6, 1)) 
random_aftertaste = np.random.normal(7.401083, 0.404463, size=(6, 1)) 
random_acidity = np.random.normal(7.535706, 0.379827, size=(6, 1)) 
random_body = np.random.normal(7.517498, 0.370064, size=(6, 1)) 

Creating user data frame 

In [10]:
user = np.hstack([random_flavor, random_aroma, random_aftertaste, random_acidity, random_body])
user_df = pd.DataFrame(user) ## User - Rating Matrix
user_df

Unnamed: 0,0,1,2,3,4
0,7.386126,7.88327,7.751276,7.434452,6.96704
1,7.921185,7.905825,6.829184,7.200268,7.393692
2,7.939069,7.290982,7.517085,7.083104,7.375463
3,7.466378,7.073184,7.337399,7.566636,7.646623
4,7.247263,7.802583,7.733184,7.44335,8.410074
5,7.632437,7.800871,7.770667,8.134195,6.72674


## Recommendation Function

Defining a function that compares each user's coffee preference in the user data frame with each item in the item data frame and displays the recommended item and its features

Note:

* In linear algebra, the dot product (or scalar product) is used to measure the alignment between two vectors or, in our context, how similar the user's preference are to each item. To ensure interpretability in the resulting scalar value, before calculating the dot product, we normalized each vector. Normalization rescales the values to a range of 0 to 1. Therefore, the closer the scalar is to 1, the higher the similarity of the coffee beans.

In [13]:

def recommendation(user_matrix, item_matrix):

    ## Building the Result Matrix by caluclating the normalized dot product between the user and item matrices. 
    Result = np.empty((len(user_matrix), len(item_matrix)))
    for i in range(0,(len(user_matrix))):
        user_vec = user_matrix.iloc[i]
        user_vec_norm = user_vec/np.linalg.norm(user_vec)
        for j in range(0,(len(item_matrix))):
            item_vec = item_matrix.iloc[j]
            item_vec_norm = item_vec/np.linalg.norm(item_vec)
            dot_product = np.dot(user_vec_norm, item_vec_norm)
            Result[i][j] = float(dot_product)

    Result_df = pd.DataFrame(Result) ## Converting the Result Matrix to a data frame.

    ## The columns with maximum row values represents the item that is most similar to the user. 
    max_col = Result_df.idxmax(axis=1) 
    
    ## Displaying Recommendations
    final = pd.DataFrame()
    ind = []
    for i in range(len(max_col)):
        final[i] = item_matrix.iloc[max_col.iloc[i]]
        ind.append(str(max_col.iloc[i]))

    final.columns = ind

    return final.transpose()


## Results

Inputting the user and item data frames into the recommendation function for the results

In [14]:
recommendation(user_df, items_df)

Unnamed: 0,flavor,aroma,aftertaste,acidity,body
1155,7.33,7.58,7.5,7.33,7.0
379,7.67,7.67,7.0,7.08,7.0
807,7.67,7.33,7.42,7.17,7.33
143,7.75,7.42,7.75,8.0,8.0
823,7.08,7.67,7.42,7.33,7.75
904,7.5,7.67,7.42,7.5,6.75


## Conclusion

This project demonstrates how linear algebra can build a personalized coffee recommendation system. The function was used to recommend coffee but it can be adapted to other datasets, making it versatile for a range of applications. 

Overall, this approach shows how concepts from linear algebra can transform everyday decisions.