To set up an automated system that fetches data from MongoDB, processes and summarizes it using the OpenAI API, and posts summaries to a Discord channel, you will need to follow several steps. Follow the detailed plan and the necessary scripts i wrote below:
-
Set Up Environment:
- Install required libraries.
- Set up MongoDB.
- Set up OpenAI API.
- Set up Discord bot.
-
Fetch Data from MongoDB:
- Connect to MongoDB and fetch data every hour.
-
Data Cleaning:
- Remove duplicates, tokenize, normalize text, remove stop words.
-
Contextual and Semantic Filtering:
- Use NLP models to identify important posts.
-
Generate Summaries Using OpenAI API:
- Summarize the filtered posts.
-
Store Summaries in MongoDB:
- Save the generated summaries back to MongoDB.
-
Post Summaries to Discord Channel:
- Use a Discord bot to post the summaries.
pip install pymongo discord.py openai nltk schedule
from pymongo import MongoClient
def get_mongo_client():
client = MongoClient("mongodb://localhost:27017/")
return client
def fetch_posts(client):
db = client['social_media']
collection = db['posts']
posts = list(collection.find({"timestamp": {"$gte": <last_hour_timestamp>}}))
return posts
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string
nltk.download('stopwords')
nltk.download('punkt')
def clean_text(text):
tokens = word_tokenize(text)
tokens = [word.lower() for word in tokens if word.isalnum()]
tokens = [word for word in tokens if word not in stopwords.words('english')]
return ' '.join(tokens)
from transformers import pipeline
def filter_posts(posts):
nlp_model = pipeline("sentiment-analysis")
important_posts = []
for post in posts:
if any(keyword in post['text'].lower() for keyword in ['update', 'announcement', 'change', 'development']):
sentiment = nlp_model(post['text'])[0]
if sentiment['label'] == 'POSITIVE' or sentiment['label'] == 'NEGATIVE':
important_posts.append(post)
return important_posts
from openai import OpenAI
client = OpenAI(api_key='YOUR_OPENAI_API_KEY')
def generate_summary(post_text):
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": f"Summarize the following text: {post_text}"}],
max_tokens=50
)
summary = response.choices[0].message['content'].strip()
return summary
def store_summaries(client, summaries):
db = client['social_media']
collection = db['summaries']
collection.insert_many(summaries)
import discord
client = discord.Client()
@client.event
async def on_ready():
print(f'We have logged in as {client.user}')
async def post_summary(channel_id, summary):
channel = client.get_channel(channel_id)
await channel.send(summary)
client.run('YOUR_DISCORD_BOT_TOKEN')
import schedule
import time
def job():
mongo_client = get_mongo_client()
posts = fetch_posts(mongo_client)
cleaned_posts = [clean_text(post['text']) for post in posts]
important_posts = filter_posts(cleaned_posts)
summaries = [{'post_id': post['_id'], 'summary': generate_summary(post['text'])} for post in important_posts]
store_summaries(mongo_client, summaries)
for summary in summaries:
post_summary('YOUR_DISCORD_CHANNEL_ID', summary['summary'])
schedule.every().hour.do(job)
while True:
schedule.run_pending()
time.sleep(1)
-
Set Up Environment:
- Install all necessary Python libraries.
- Set up MongoDB with collections for posts and summaries.
- Register and set up a Discord bot.
-
Fetch Data from MongoDB:
- Connect to MongoDB and fetch posts from the last hour.
-
Data Cleaning:
- Clean the text data by removing duplicates, tokenizing, normalizing text, and removing stop words.
-
Contextual and Semantic Filtering:
- Use NLP models to filter out posts that contain important keywords and have a significant sentiment.
-
Generate Summaries Using OpenAI API:
- Summarize the filtered posts using the OpenAI API.
-
Store Summaries in MongoDB:
- Save the generated summaries back into the MongoDB summaries collection.
-
Post Summaries to Discord Channel:
- Use a Discord bot to post the summaries to a specific Discord channel.
By following these steps and using the provided scripts, you can set up an automated system to fetch, process, summarize, and post social media data.