This project extracts real-time post trends and stores them in a database. The project uses natural language processing techniques to analyze the posts and extract trends. The trends are stored in Supabase for later querying.
- Supabase: Database to store the trends.
- Compromise: NLP library for text processing.
- Google Gemini: Generative AI API to classify topics.
- N-grams Extraction: Extraction of words, phrases, and hashtags from posts.
- Content Filtering: Filtering of stopwords, blacklist words, and irrelevant content.
- Text Classification: Topic classification using a text classifier.
- Trend Storage: Storage of trends in Supabase.
- Node.js: Install Node.js to run the project.
- Supabase (optional): Set up an account on Supabase and obtain the necessary credentials.
- Google Gemini API Key (optional): Set up an account on Google Cloud and create a project with the Generative Language API (Gemini API) enabled and obtain the API key from https://aistudio.google.com/app/apikey.
The use of Supabase is optional and Google Gemini is used only to classify topics. You can replace these services with others of your choice.
-
Clone the repository:
git clone https://github.com/Rafael-BD/Bsky-Trends cd Bsky-Trends
-
Set up the environment variables in the
.env
file:SUPABASE_URL=your_supabase_url SVC_KEY=your_supabase_key GOOGLE_API_KEY=your_google_api_key DEV=true # Set to false in production
-
Create a Supabase table named
trends
with the following columns:id
trend
(jsonb)lang
(text)updated_at
(TIMESTAMPZ)
-
Create a Supabase Storage bucket named
checkpoints
to store the trends checkpoints that are used to the server to recover the trends in case of a restart.
-
Install the dependencies:
npm install
or
bun install
-
Start the WebSocket client to listen to posts:
npm run start
or
bun server.ts
Now make GET requests to
http://localhost:8003/trending
to get the trends.
The project extracts words, phrases, and hashtags from posts using NLP techniques. The extraction is done through the extractWords
, extractSentences
, and extractHashtags
functions.
The extracted content is filtered to remove stopwords, blacklist words, and irrelevant content. The filtering is done through the filterWords
and filterSentences
functions.
The extracted topics are classified using Google Gemini AI. The classification is done through the classifyText
function.
The trends are stored in Supabase. The storage is done through the services/saveTrends.ts
file.
The project also has a public API to get the trends. The API documentation is available at https://github.com/Rafael-BD/Bsky-Trends-API.
Contributions are welcome! Feel free to open issues and pull requests.