<a href="https://colab.research.google.com/github/ClarenceKaranja/FUTURE-TECH-IMPACT-INDEX/blob/main/RECOMMENDER_SYSTEM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Title:**
## **NAVIGATING TOMORROW: INTRODUCING THE FUTURE TECH IMPACT INDEX**

In the swift current of technological advancement, staying ahead requires more than just keeping up—it demands foresight. Welcome to the "Future Tech Impact Index," a revolutionary project set to transform decision-making in the ever-changing landscape of emerging technologies.

**A Glimpse into the Future:**
The "Future Tech Impact Index" isn't just a tool; it's a compass designed to guide businesses, investors, and policymakers through the complexities of tomorrow's innovations.

**Data-Driven Insights:**
This index leverages data science to provide a comprehensive assessment of emerging technologies. It's not just about information; it's about actionable insights.

**Empowering Decision-Making:**
Imagine having a tool that not only empowers businesses with strategic insights but also lends expertise to policymakers crafting regulations for sustainable progress. That's the "Future Tech Impact Index."

**How It Works in a Snapshot:**
Automated data collection, meticulous methodology, and a keen alignment with societal goals—this index simplifies the complex journey from information to impact scores.

**Illuminate Tomorrow's Transformations:**
The recommendations made aren't just ranks; they're a guiding light, revealing the transformative potential of emerging technologies.

**In a Nutshell:**
The "Future Tech Impact Index" is your key to navigating the future with confidence. In a world where innovation is constant, this index is your strategic ally—a concise, powerful guide to informed decisions and sustainable progress. Welcome to a new era of technological foresight—welcome to the future.

# Section 1: Downloading NLTK Resources
This section ensures that the necessary NLTK resource for tokenization is available.



In [None]:
import nltk

try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    print('Downloading punkt...')
    nltk.download('punkt')

Downloading punkt...


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


# Section 2: Importing Libraries
This section imports the necessary libraries for data processing and analysis.

In [None]:
try:
    import numpy
    import pandas as pd
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.metrics.pairwise import cosine_similarity
    import re
    from nltk.stem.snowball import SnowballStemmer
except ImportError:
    print('You are missing some packages! ' \
          'We will try installing them before continuing!')
    !pip install "numpy" "pandas" "sklearn" "nltk"
    import numpy
    import pandas as pd
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.metrics.pairwise import cosine_similarity
    import re
    from nltk.stem.snowball import SnowballStemmer
    import nltk
    nltk.download('punkt')

print('Done!')

Done!


# Section 3: Set up the Stemmer
Initialize the SnowballStemmer from NLTK to stem words in the abstracts.

In [None]:
# Set up the stemmer
stemmer = SnowballStemmer("english")

# Section 4: Load and Clean Data
Load patent abstracts from a CSV file, select the 'ABSTRACTS' column, and drop rows with missing values.

In [None]:
# Load the patents data
PATH_PATENTS = "/content/drive/MyDrive/content_recommender_data/ABSTRACTS.csv"
patents = pd.read_csv(PATH_PATENTS)
patents = patents[['ABSTRACTS']].dropna()  # select the 'abstract' column and drop rows with missing values
abstracts = patents['ABSTRACTS'].tolist()  # convert the abstracts to a list

# Section 5: Clean and Tokenize Abstracts
Define a function to clean and tokenize the abstracts using NLTK and regular expressions.

In [None]:
# Clean and tokenize the abstracts
def clean_tokenize(document):
    document = re.sub('[^\w_\s-]', ' ', document)  # remove punctuation marks and other symbols
    tokens = nltk.word_tokenize(document)  # tokenize sentences
    cleaned_abstract = ' '.join([stemmer.stem(item) for item in tokens])  # stem each token
    return cleaned_abstract

# Apply the cleaning and tokenization function to each abstract.
cleaned_abstracts = list(map(clean_tokenize, abstracts))

# Section 6: Get User Input for Keywords
Prompt the user to input keywords separated by commas.

In [None]:
# Get user input for keywords separated by commas
user_keywords = input("Enter keywords separated by commas and then press Enter: ")

Enter keywords separated by commas and then press Enter: a



# Section 7: Split User Input into a List of Keywords
Split the user input into a list of keywords, removing any leading or trailing whitespaces.

In [None]:

# Split user input into a list of keywords
user_keywords_list = [keyword.strip() for keyword in user_keywords.split(',')]


# Section 8: Combine Keywords for Processing
Combine the cleaned list of keywords into a single string for further processing.

In [None]:
# Combine keywords into a single string for processing
cleaned_user_keywords = ' '.join(user_keywords_list)


# Section 9: Process and Clean User Input Keywords
Apply the same cleaning and tokenization function used for abstracts to the user input keywords.

In [None]:

# Process and clean user input keywords
cleaned_user_keywords = clean_tokenize(cleaned_user_keywords)


# Section 10: Generate TF-IDF Matrix for User Input Keywords and All Patents
Use the TfidfVectorizer to generate TF-IDF matrix for both user input keywords and all patent abstracts.

In [None]:

# Generate TF-IDF matrix for user input keywords and all patents
tfidf_matrix = TfidfVectorizer(stop_words='english', min_df=2)
abstract_tfidf_matrix = tfidf_matrix.fit_transform(cleaned_abstracts)
user_keywords_tfidf_vector = tfidf_matrix.transform([cleaned_user_keywords])


# Section 11: Calculate Cosine Similarity
Calculate cosine similarity between user input keywords and all patents based on their TF-IDF representations.

In [None]:
# Calculate cosine similarity between user input keywords and all patents
patents_similarity_score = cosine_similarity(abstract_tfidf_matrix, user_keywords_tfidf_vector)
recommended_patents_id = patents_similarity_score.flatten().argsort()[::-1]


# Section 12: Define Number of Recommendations to Display
Prompt the user to input the number of recommendations they want to display.

In [None]:
# Define the number of recommendations you want to display
num_recommendations = int(input("Enter the number of recommendations you want to display and then press Enter: "))

Enter the number of recommendations you want to display and then press Enter: 1


# Section 13: Display Top N Recommended Patents
Display the top N recommended patents along with their full abstract content.

In [None]:
recommended_patents = patents.loc[patents.index.isin(recommended_patents_id[:num_recommendations]), 'ABSTRACTS']

print(f'\nTop {num_recommendations} Recommended Patents:')
for idx, abstract in zip(recommended_patents.index, recommended_patents):
    print(f'\nPatent ID: {idx}\nAbstract: {abstract}\n{"-"*50}')



Top 1 Recommended Patents:

Patent ID: 93661
Abstract: A deck leverage anchor (40) for securing an external device has an anchor body that is positioned at least partially within an opening of the deck (30) so that a notch receives the edge of the surface. The anchor body comprises a swivel coupler (64’’’) that extends outward from the opening. The swivel coupler couples to an external device. A swivel plate (102) is disposed opposite the swivel coupler to engage and distribute a load on the surface of the deck opposite the coupler.
--------------------------------------------------
