## Reddit API Connection

Reddit API (Applicaion programming interface) Access:
https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example
Python example

Reddit API Doc:
https://www.reddit.com/dev/api/#POST_api_comment

In [None]:
#!pip install praw
#!pip install --upgrade pandas
#!pip install python-dotenv
#!pip install openai
#!pip install --upgrade langchain

In [20]:
#store reddit accout info in JSON format in reddit_acct.txt
with open('reddit_acct.txt','r') as f:
    reddit_acct = f.read()

#conver data in json
import json
acct_json = json.loads(reddit_acct)
#print(acct_json)

username = acct_json['username']
client_id = acct_json['client_id'] 
client_secret = acct_json['client_secret']
pw = acct_json['pw']
app = acct_json['app']

In [22]:
import requests
import requests.auth

client_auth = requests.auth.HTTPBasicAuth(client_id,client_secret)

In [24]:
post_data = {
    'grant_type': 'password',
    'username' : username, 
    'password' : pw
}

headers = {"User-Agent": "textanalysis-python by Ada2802"}  # app-name by u/USERNAME
response = requests.post('https://www.reddit.com/api/v1/access_token',
                   auth=client_auth, data=post_data, headers=headers)
#response.json()

In [25]:
#check token
TOKEN = response.json()['access_token']

## Google Gemini API Connect

In [None]:
#!pip install google-generativeai
#!pip install google-colab  # fail to install it!!!

In [40]:
# Import the Python SDK
#from google.colab import userdata
import google.generativeai as genai

#Used to securely store your API key in gmini_api_key.txt
with open('gmini_api_key.txt','r') as f:
    GOOGLE_API_KEY = f.read()

#GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)


In [41]:
#Initialize the Generative Model
model = genai.GenerativeModel('gemini-pro')

In [42]:
# test: Generate text
response_gemini = model.generate_content("what is a vector database? what companies build vector databases? ")
print(response_gemini.text)

**Vector Database**

A vector database is a type of database designed to store and manage high-dimensional data, where each data point is represented as a vector of numbers. Vectors are commonly used to represent complex data types like images, text, and time series.

Unlike traditional relational databases (e.g., SQL), vector databases use similarity metrics (e.g., Euclidean distance, cosine similarity) to retrieve data based on its proximity to a query vector. This enables efficient search and comparison operations on large datasets.

**Companies Building Vector Databases**

* **Anyscale (Formerly Ray)**
* **ArangoDB**
* **ClickHouse**
* **Faiss (Meta Platforms)**
* **Latent Semantic Analysis (LSA)**
* **Milvus (Zilliz)**
* **Pinecone**
* **Polyaxon**
* **ScyllaDB**
* **TensorFlow Similarity**
* **vespa.ai**
* **Zillow**


## Get / Post comment to Reddit  - Chatbot

Program a Reddit Bot - Python https://www.youtube.com/watch?v=3FpqXyJsd1s&t=76s

Praw doc: (Note: praw does not provide a way to get a list of all subreddits. )
* Comment Extraction and Parsing
https://praw.readthedocs.io/en/stable/tutorials/comments.html#extracting-comments-with-praw

* Submission Stream Reply Bot
https://praw.readthedocs.io/en/stable/tutorials/reply_bot.html

Task: for those comments of the orignial posts, which have the 'keyword' in the subreddit (channel), I will post a random quote as my comment to the orignial posts. 

In [35]:
#!/usr/bin/env python3
from urllib.parse import quote_plus
import random
import time
import praw
import pandas as pd
from datetime import datetime

In [36]:
reddit = praw.Reddit(client_id = client_id, client_secret = client_secret, 
           user_agent = headers.get("User-Agent"),
           username = username,
           password = pw)

In [38]:
#### set the keyword you need to look up in comments
keyword = ["what is", "who is", "what are", "how", "when", "where"]

df_q= pd.DataFrame(columns=['id','title','time_utc','answer'])

#please be awear the post rule in Reddit, limit 10
#Submissions are yielded oldest first. Up to 100 historical submissions will initially be returned.
#To only retrieve new submissions starting when the stream is created, pass skip_existing=True:
#for submission in reddit.subreddit("AskReddit").stream.submissions(skip_existing=True):
for submission in reddit.subreddit("AskReddit").hot(limit=5):#limit 10 posts
    # Ignore titles with more than 10 words as they probably are not simple questions.
    if len(submission.title.split()) > 10:
        pass
    
    normalized_title = submission.title.lower()
    for question_phrase in keyword:
        if question_phrase in normalized_title:   
            print(submission.title)
            
            # get answer from Gemini
            ans = model.generate_content(submission.title) # tested: it works! 
            
            # post the answer to Reddit by removing #
            #comment.reply(ans.text) # tested: it works!
            
            # store question and answer to a dataframe
            df_q =df_q.append({'id': submission.id,
                               'title':submission.title,
                               'time_utc':datetime.utcfromtimestamp(submission.created_utc).strftime('%Y-%m-%d %H:%M:%S')
                               ,'answer': ans.text
                              },ignore_index=True)
           
            

guys who got their marriage proposal rejected, how are you now?


  df_q =df_q.append({'id': submission.id,


What is the dumbest thing you've said during sex?


  df_q =df_q.append({'id': submission.id,


In [39]:
df_q

Unnamed: 0,id,title,time_utc,answer
0,1fwkhy9,"guys who got their marriage proposal rejected,...",2024-10-05 06:52:52,"**Emotional Impact:**\n\n* Intense pain, disap..."
1,1fwrrl9,What is the dumbest thing you've said during sex?,2024-10-05 14:34:39,"I am sorry, but I am not supposed to generate ..."


In [123]:
# Save DataFrame to CSV file
df_q.to_csv('reddit_gemini_responses.csv', index=False)