# Recommender Systems

### Recommendation systems are a set of algorithms which recommend most relevant items to users based on their preferences predicted using the algorithms. It acts on behavioral data, such as customer's previous purchase, ratings or reviews to product their likelihood of buying a new product or service

##### 3 Algorithms that are widely used for building recommendation systems:
1. Associative Rules
2. Collaborative Filtering
3. Matrix Factorization

## Association rule
#### Association rule finds combinations of items that frequently occur together in orders or baskets. The items that frequently occur together are called itemsets. Itemsets help to discover relationships beteween items that pople buy together and use that as a basis for creating strategies like combining products as combo offer or place products next to each other in reatail shelves to attract customer attention. An application of association rule mining is in the Market Basket Analysis (MBA).
Association rule considres all possible combination of items in the previous baskets and computes various measures such as support, confidence, and lift to identify rules with stronger associations. 

# Metrics
## Support:
Support indicates the frequencies of items appearing together in baskets with respect to all possible baskets being considered. 

## Confidence:
Confidence measures the proportion of the transaction that contain X, which also contain Y. X is called antecedent and Y is called consequent. 

## Lift
Lift can be intepreted as the degree of association betwee two items. Lift value 1 indicates that thte items are independent (no association), lift value of less than 1 implies that the products are substitution (purchase of one product will decrease the probability of purchase of the other product) and lift value of greater than 1 indicates purchase of product X will increase the probability fo purchase of product Y. Lift value of greater than 1 is a necessary condition of generating association rules.

Sure! Let's break down the metrics **Support**, **Confidence**, and **Lift** using a simple example from market basket analysis, which is a common application in recommender systems.

### Example Scenario
Imagine we have a small grocery store and we want to analyze the purchasing patterns of our customers. Here are five transactions:

| Transaction ID | Items Purchased                |
|----------------|--------------------------------|
| T1             | Milk, Bread, Butter            |
| T2             | Bread, Butter                  |
| T3             | Milk, Bread                    |
| T4             | Milk, Butter                   |
| T5             | Bread, Butter, Jam             |

### Metrics

1. **Support**
   - **Definition**: Support is the proportion of transactions in the dataset that contain a particular itemset.
   - **Formula**: 
     \[
     \text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}}
     \]
   - **Example**: For the itemset {Bread, Butter}:
     \[
     \text{Support}(\{Bread, Butter\}) = \frac{3}{5} = 0.6
     \]
     This means 60% of the transactions contain both Bread and Butter.

2. **Confidence**
   - **Definition**: Confidence is the proportion of transactions containing item A that also contain item B.
   - **Formula**: 
     \[
     \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)}
     \]
   - **Example**: For the rule {Bread} → {Butter}:
     \[
     \text{Confidence}(\{Bread\} \rightarrow \{Butter\}) = \frac{\text{Support}(\{Bread, Butter\})}{\text{Support}(\{Bread\})} = \frac{0.6}{0.8} = 0.75
     \]
     This means that 75% of the transactions that contain Bread also contain Butter.

3. **Lift**
   - **Definition**: Lift measures the strength of an association rule over the random co-occurrence of the items. It is the ratio of the observed support to the expected support if the items were independent.
   - **Formula**: 
     \[
     \text{Lift}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A) \times \text{Support}(B)}
     \]
   - **Example**: For the rule {Bread} → {Butter}:
     \[
     \text{Lift}(\{Bread\} \rightarrow \{Butter\}) = \frac{\text{Support}(\{Bread, Butter\})}{\text{Support}(\{Bread\}) \times \text{Support}(\{Butter\})} = \frac{0.6}{0.8 \times 0.8} = 0.9375
     \]
     A lift value of 0.9375 indicates that Bread and Butter are slightly less likely to be bought together than if they were independent.

### Summary
- **Support**: Indicates how frequently an itemset appears in the dataset.
- **Confidence**: Measures how often items in B appear in transactions that contain A.
- **Lift**: Evaluates the strength of an association rule compared to random chance.

These metrics help in understanding the relationships between items and are crucial for building effective recommender systems¹².

Would you like to see how to calculate these metrics using Python?

Source: Conversation with Copilot, 6/11/2024
(1) Association Rule Mining Explained With Examples. https://codinginfinite.com/association-rule-mining-explained-with-examples/.
(2) Understanding Support, Confidence, Lift for Market Basket (Affinity .... https://www.thedataschool.co.uk/liu-zhang/understanding-lift-for-market-basket-analysis/.
(3) Association Rule Mining Explained With Examples. https://bing.com/search?q=Recommender+system+metrics+Support+confidence+lift+example.
(4) An Application in Retail using Python - Sogeti Labs. https://labs.sogeti.com/recommender-systems-using-apriori/.
(5) 10 metrics to evaluate recommender and ranking systems - Evidently AI. https://www.evidentlyai.com/ranking-metrics/evaluating-recommender-systems.

In [12]:
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers.readers:

read_csv(filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]', *, sep: 'str | None | lib.NoDefault' = <no_default>, delimiter: 'str | None | lib.NoDefault' = None, header: "int | Sequence[int] | None | Literal['infer']" = 'infer', names: 'Sequence[Hashable] | None | lib.NoDefault' = <no_default>, index_col: 'IndexLabel | Literal[False] | None' = None, usecols: 'UsecolsArgType' = None, dtype: 'DtypeArg | None' = None, engine: 'CSVEngine | None' = None, converters: 'Mapping[Hashable, Callable] | None' = None, true_values: 'list | None' = None, false_values: 'list | None' = None, skipinitialspace: 'bool' = False, skiprows: 'list[int] | int | Callable[[Hashable], bool] | None' = None, skipfooter: 'int' = 0, nrows: 'int | None' = None, na_values: 'Hashable | Iterable[Hashable] | Mapping[Hashable, Iterable[Hashable]] | None' = None, keep_default_na: 'bool' = True, na_filter: 'bool' = True, verbose: 'bool | 

In [13]:
import pandas as pd
groceries_df = pd.read_csv("groceries.csv", header = None)

In [14]:
groceries_df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22,23,24,25,26,27,28,29,30,31
0,citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,...,,,,,,,,,,
1,tropical fruit,yogurt,coffee,,,,,,,,...,,,,,,,,,,
2,whole milk,,,,,,,,,,...,,,,,,,,,,
3,pip fruit,yogurt,cream cheese,meat spreads,,,,,,,...,,,,,,,,,,
4,other vegetables,whole milk,condensed milk,long life bakery product,,,,,,,...,,,,,,,,,,
5,whole milk,butter,yogurt,rice,abrasive cleaner,,,,,,...,,,,,,,,,,
6,rolls/buns,,,,,,,,,,...,,,,,,,,,,
7,other vegetables,UHT-milk,rolls/buns,bottled beer,liquor (appetizer),,,,,,...,,,,,,,,,,
8,pot plants,,,,,,,,,,...,,,,,,,,,,
9,whole milk,cereals,,,,,,,,,...,,,,,,,,,,
