<a href="https://colab.research.google.com/github/charliezhou1/AutoPostCreation/blob/main/Auto_post_creation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Auto Post Creation with GPT-4 and LlamaIndex

This project is to build an AI-powered content generator using **RAG (Retrieval-Augmented Generation)** with **OpenAI GPT-4**, **LangChain**, and **LlamaIndex**. It reads corporate reports from a directory, processes them with embeddings and LLMs, and outputs auto-generated content (e.g., LinkedIn posts or summaries).

---

## 🚀 Features

- Loads **corporate report documents** from Google Drive
- Uses **OpenAI GPT-4** for post generation
- Embeds documents using **text-embedding-3-small**
- Built on **LlamaIndex** + **LangChain**
- RAG-powered generation for relevant and grounded content

---

In [None]:
!pip install llama-index -q
!pip install langchain -q
!pip install langchain_experimental -q


import os
import nest_asyncio

nest_asyncio.apply()

# Set OpenAI API Key
os.environ["OPENAI_API_KEY"] = "*"
print(os.getenv("OPENAI_API_KEY"))

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

# Setup LLM model and embeddings
Settings.llm = OpenAI(model='gpt-4-0125-preview', temperature=0.2)
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small')
Settings.chunk_size = 1024

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 3070, in _dep_map
    return self.__dep_map
  File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pkg_resources/__init__.py", line 2863, in __getattr__
    raise AttributeError(attr)
AttributeError: _DistInfoDistribution__dep_map

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 447, in run
    conflicts = self._determine_conflicts(to_install)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 5

In [None]:
from google.colab import drive
drive.mount('/content/drive')

data_dir = '/content/drive/MyDrive'  # Replace with your path
os.makedirs(f'{data_dir}/RAG/data/corporate_reports/', exist_ok=True)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from llama_index.core import SimpleDirectoryReader

def load_reports_data():
    reports_path = f'{data_dir}/RAG/data/corporate_reports/'
    documents = SimpleDirectoryReader(reports_path).load_data()
    return documents

reports_documents = load_reports_data()

In [None]:

from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage

PERSIST_INDEX_DIR = f'{data_dir}/RAG/storage/'

def get_or_create_index(index_name, documents):
    index_dir = f"{PERSIST_INDEX_DIR}{index_name}/"

    # Ensure directory exists
    if not os.path.exists(index_dir):
        os.makedirs(index_dir, exist_ok=True)

    if not os.listdir(index_dir):  # Check if the directory is empty
        print(f"Creating index for {index_name}")
        # Create and persist the index
        index = VectorStoreIndex.from_documents(documents)
        index.storage_context.persist(index_dir)
    else:
        print(f"Loading existing index for {index_name}")
        # Load index from storage
        storage_context = StorageContext.from_defaults(persist_dir=index_dir)
        index = load_index_from_storage(storage_context)

    return index

corporate_index = get_or_create_index("corporate_reports", reports_documents)


Loading existing index for corporate_reports


In [None]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

# Define a query engine for corporate reports
corporate_engine = corporate_index.as_query_engine(similarity_top_k=5)

# Create a tool for querying corporate reports
corporate_query_tool = QueryEngineTool(
    query_engine=corporate_engine,
    metadata=ToolMetadata(
        name="Corporate_Reports",
        description="Generates summaries and insights from corporate reports for LinkedIn posts."
    )
)



In [None]:

# Define post engines for different platforms
def linkedin_post_engine(query):
    return corporate_query_tool.query_engine.query(query)

def reddit_post_engine(query):
    # Customize summary for Reddit style
    query = "Generate a Reddit post summarizing the latest corporate report with a more casual tone."
    return corporate_query_tool.query_engine.query(query)

def twitter_post_engine(query):
    # Customize summary for Twitter style (concise)
    query = "Generate a Twitter post summarizing the latest corporate report in a tweet-length format."
    return corporate_query_tool.query_engine.query(query)

In [None]:
# Define a Router Query Engine for selecting post engines
def router_engine(query):
    if "LinkedIn" in query:
        return linkedin_post_engine(query)
    elif "Reddit" in query:
        return reddit_post_engine(query)
    elif "Twitter" in query:
        return twitter_post_engine(query)
    else:
        return "No platform specified."

# Example queries
linkedin_query = "Generate a LinkedIn post summarizing the key insights from the latest corporate report."
reddit_query = "Generate a Reddit post summarizing the key insights from the latest corporate report."
twitter_query = "Generate a Twitter post summarizing the key insights from the latest corporate report."

# Example routing of queries
linkedin_response = router_engine(linkedin_query)
reddit_response = router_engine(reddit_query)
twitter_response = router_engine(twitter_query)

print("LinkedIn Response:", linkedin_response)
print("Reddit Response:", reddit_response)
print("Twitter Response:", twitter_response)

LinkedIn Response: 🚀 Exciting Insights from Our Latest Corporate Report on AI! 🚀

We're thrilled to share groundbreaking findings from our recent survey on the state of AI in early 2024. With responses from 1,363 participants across various industries and regions, our research highlights the remarkable journey of AI and generative AI (gen AI) adoption and its burgeoning impact on businesses worldwide.

🌐 Key Highlights:
- A significant surge in gen AI adoption, with 981 organizations implementing AI in at least one business function.
- An impressive 878 organizations are now regularly using gen AI, showcasing its growing influence and utility in the business landscape.
- Our analysis reveals that high-performing companies are leveraging highly customized or bespoke gen AI solutions to capture unparalleled business value, moving beyond off-the-shelf solutions to achieve competitive differentiation.

🔍 Deep Dive into Practices:
- The survey sheds light on the practices of high performers