We could use a collaborative filtering approach to build this recommendation system. Collaborative filtering is a technique that can be used to find similarities between users based on their behavior or preferences. In this case, you could use the maskid and campaign columns to find users who have attended similar campaigns and then recommend campaigns to users based on the behavior of similar users.


In [None]:
import pandas as pd

# Load the data from the .csv file
data = pd.read_excel('wellAP.xlsx')

data.head()

In [None]:
data.info()

The below code creates a pivot table from your data with maskid as rows and campaign as columns. The values in the pivot table represent the count of each campaign for each user. 

In [None]:
# Create a pivot table with maskid as rows, campaign as columns, and count of campaign as values
pivot_table = pd.pivot_table(data, index='cif_id', columns='Campaign', aggfunc='size', fill_value=0)

pivot_table


The below code now calculates the similarity between users using the cosine similarity metric and stores the result in a DataFrame. The get_similar_users function can be used to get the top n most similar users for a given user. The get_campaign_recommendations function uses this information to recommend campaigns to a user based on the behavior of similar users.

In [None]:
# Calculate the similarity between users using the cosine similarity metric
from sklearn.metrics.pairwise import cosine_similarity
user_similarity = cosine_similarity(pivot_table)

# Convert the similarity matrix into a DataFrame
user_similarity_df = pd.DataFrame(user_similarity, index=pivot_table.index, columns=pivot_table.index)

# Function to get top n similar users for a given user
def get_similar_users(user, n):
    # Get the similarity scores for the given user
    similarity_scores = user_similarity_df[user]
    # Sort the similarity scores in descending order
    sorted_scores = similarity_scores.sort_values(ascending=False)
    # Get the top n most similar users (excluding the first result which is the user itself)
    top_similar_users = sorted_scores.iloc[1:n+1]
    return top_similar_users

# Function to get campaign recommendations for a given user
def get_campaign_recommendations(user, n):
    # Get the top n most similar users for the given user
    similar_users = get_similar_users(user, n)
    # Get the campaigns attended by the similar users
    similar_users_campaigns = pivot_table.loc[similar_users.index]
    # Calculate the mean attendance for each campaign across the similar users
    mean_attendance = similar_users_campaigns.mean()
    # Round the mean attendance values to 4 decimal places
    mean_attendance = mean_attendance.round(4)
    # Sort the mean attendance in descending order
    sorted_mean_attendance = mean_attendance.sort_values(ascending=False)
    # Get the top n recommended campaigns
    recommended_campaigns = sorted_mean_attendance.iloc[:n]
    return recommended_campaigns



In [None]:
# Example usage: get top 5 campaign recommendations for user 916783
print(get_campaign_recommendations(916783, 5))


Read the last comments to interpret these values.

**NOTE**: Please use this code on the original dataset with actual campaign names. Right now, the campaigns are differents from one another only by the digit.

In [None]:
# Just checking which all campaigns this user has attended
pivot_table.loc[3999499]

# it indicates the user 3999499 has attended campaign Will Writing Workshop 4 times.

In [None]:
# Example usage: get top 5 campaign recommendations for user 3999499
print(get_campaign_recommendations(3999499, 5))

The values 1.0 and 0.0 in the final output of the code represent the mean attendance of each recommended campaign across the similar users. The get_campaign_recommendations function calculates the mean attendance for each campaign across the similar users and returns the top n recommended campaigns based on their mean attendance.

In other words, a value of 1.0 for a campaign means that all of the similar users have attended that campaign, while a value of 0.0 means that none of the similar users have attended that campaign. Values between 0.0 and 1.0 represent the proportion of similar users who have attended that campaign.

The values between 0.0 and 1.0 in the final recommendation represent the average attendance of each recommended campaign across the similar users. A value closer to 1.0 means that a higher proportion of similar users have attended that campaign, while a value closer to 0.0 means that a lower proportion of similar users have attended that campaign.

For example, if a recommended campaign has a value of 0.75, this means that 75% of the similar users have attended that campaign. Similarly, if a recommended campaign has a value of 0.25, this means that only 25% of the similar users have attended that campaign.

**Is this recommendation system accurate?**

We cannot determine it as there is no historical data to verify and test the predictions. However, with the recommendations provided based on this prediction, we can check in future if the user is opting for those recommendations or not. Such choice of users can be captured and then the recommendation system can be evaluated to be accurate or not. 

**Can we add more columns such as genders in the recommendation system?**

Yes, it can be added. However, I did not see any significant difference with the gender. It could be because of the small data or we need to check with some more and different maskid. 

Below is the code that I tried taking maskid, campaign and gender columns.

In [None]:
# Function to get campaign recommendations for a given user and gender
def get_campaign_recommendations(user, gender, n):
    # Filter the data based on the selected gender
    data_gender = data[data['gender'] == gender]
    
    # Create a pivot table with maskid as rows, campaign as columns, and count of campaign as values
    pivot_table = pd.pivot_table(data_gender, index='cif_id', columns='Campaign', aggfunc='size', fill_value=0)

    # Calculate the similarity between users using the cosine similarity metric
    from sklearn.metrics.pairwise import cosine_similarity
    user_similarity = cosine_similarity(pivot_table)

    # Convert the similarity matrix into a DataFrame
    user_similarity_df = pd.DataFrame(user_similarity, index=pivot_table.index, columns=pivot_table.index)

    # Function to get top n similar users for a given user
    def get_similar_users(user, n):
        # Get the similarity scores for the given user
        similarity_scores = user_similarity_df[user]
        # Sort the similarity scores in descending order
        sorted_scores = similarity_scores.sort_values(ascending=False)
        # Get the top n most similar users (excluding the first result which is the user itself)
        top_similar_users = sorted_scores.iloc[1:n+1]
        return top_similar_users

    # Get the top n most similar users for the given user
    similar_users = get_similar_users(user, n)
    # Get the campaigns attended by the similar users
    similar_users_campaigns = pivot_table.loc[similar_users.index]
    # Calculate the mean attendance for each campaign across the similar users
    mean_attendance = similar_users_campaigns.mean()
    # Round the mean attendance values to 4 decimal places
    mean_attendance = mean_attendance.round(4)
    # Sort the mean attendance in descending order
    sorted_mean_attendance = mean_attendance.sort_values(ascending=False)
    # Get the top n recommended campaigns
    recommended_campaigns = sorted_mean_attendance.iloc[:n]
    return recommended_campaigns

# Example usage: get top 5 campaign recommendations for user 1 and gender 1
print(get_campaign_recommendations(3999499, "M", 3))

In [None]:
# Function to get campaign recommendations for a given user and zip code
def get_campaign_recommendations(user, cfzip_partial, n):
    # Filter the data based on the selected gender
    #data_gender = data[data['gender'] == gender]
    data_cfzip_partial=data[data['cfzip_partial']== cfzip_partial]
    
    # Create a pivot table with maskid as rows, campaign as columns, and count of campaign as values
    pivot_table = pd.pivot_table(data_cfzip_partial, index='cif_id', columns='Campaign', aggfunc='size', fill_value=0)

    # Calculate the similarity between users using the cosine similarity metric
    from sklearn.metrics.pairwise import cosine_similarity
    user_similarity = cosine_similarity(pivot_table)

    # Convert the similarity matrix into a DataFrame
    user_similarity_df = pd.DataFrame(user_similarity, index=pivot_table.index, columns=pivot_table.index)

    # Function to get top n similar users for a given user
    def get_similar_users(user, n):
        # Get the similarity scores for the given user
        similarity_scores = user_similarity_df[user]
        # Sort the similarity scores in descending order
        sorted_scores = similarity_scores.sort_values(ascending=False)
        # Get the top n most similar users (excluding the first result which is the user itself)
        top_similar_users = sorted_scores.iloc[1:n+1]
        return top_similar_users

    # Get the top n most similar users for the given user
    similar_users = get_similar_users(user, n)
    # Get the campaigns attended by the similar users
    similar_users_campaigns = pivot_table.loc[similar_users.index]
    # Calculate the mean attendance for each campaign across the similar users
    mean_attendance = similar_users_campaigns.mean()
    # Round the mean attendance values to 4 decimal places
    mean_attendance = mean_attendance.round(4)
    # Sort the mean attendance in descending order
    sorted_mean_attendance = mean_attendance.sort_values(ascending=False)
    # Get the top n recommended campaigns
    recommended_campaigns = sorted_mean_attendance.iloc[:n]
    return recommended_campaigns

# Example usage: get top 5 campaign recommendations for user 1 and gender 1
print("Recommended", get_campaign_recommendations(3999499,520,3))


In [None]:
# Function to get campaign recommendations for a given user , gender and zip code
def get_campaign_recommendations(user, cfzip_partial,gender, n):
    # Filter the data based on the selected gender
    data_gender = data[data['gender'] == gender]
    data_cfzip_partial=data[data['cfzip_partial']== cfzip_partial]
    
    # Create a pivot table with maskid as rows, campaign as columns, and count of campaign as values
    pivot_table = pd.pivot_table(data_gender, data_cfzip_partial, index='cif_id', columns='Campaign', aggfunc='size', fill_value=0)

    # Calculate the similarity between users using the cosine similarity metric
    from sklearn.metrics.pairwise import cosine_similarity
    user_similarity = cosine_similarity(pivot_table)

    # Convert the similarity matrix into a DataFrame
    user_similarity_df = pd.DataFrame(user_similarity, index=pivot_table.index, columns=pivot_table.index)

    # Function to get top n similar users for a given user
    def get_similar_users(user, n):
        # Get the similarity scores for the given user
        similarity_scores = user_similarity_df[user]
        # Sort the similarity scores in descending order
        sorted_scores = similarity_scores.sort_values(ascending=False)
        # Get the top n most similar users (excluding the first result which is the user itself)
        top_similar_users = sorted_scores.iloc[1:n+1]
        return top_similar_users

    # Get the top n most similar users for the given user
    similar_users = get_similar_users(user, n)
    # Get the campaigns attended by the similar users
    similar_users_campaigns = pivot_table.loc[similar_users.index]
    # Calculate the mean attendance for each campaign across the similar users
    mean_attendance = similar_users_campaigns.mean()
    # Round the mean attendance values to 4 decimal places
    mean_attendance = mean_attendance.round(4)
    # Sort the mean attendance in descending order
    sorted_mean_attendance = mean_attendance.sort_values(ascending=False)
    # Get the top n recommended campaigns
    recommended_campaigns = sorted_mean_attendance.iloc[:n]
    return recommended_campaigns

# Example usage: get top 5 campaign recommendations for user 1 and gender 1
# print("Recommended", get_campaign_recommendations(3999499,520,"M", 3))

cifID = int(input("Cif ID:" ))
postal_code = input("First 3 digit of your postal code: ")
user_gender = input("Male/Female?: ")
NoofRec = int(input("Number of recommendations:"  ))


print(get_campaign_recommendations(cifID, postal_code, user_gender, NoofRec))


**Some other variations that you can try?**

You can use the same code for two different datasets - by filtering for Male and Female. You can drop/remove the rows with missing genders. This way, the recommendations **may** be more accurate, but that cannot be tested without having the historical data.

Do not forget to use this code (by correcting the column names) on your original data.