# Project: Text/Langchain Portfolio (Optional)

**Instructions for Students:**

Please carefully follow these steps to complete and submit your assignment:

1. **Completing the Assignment**: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.
   
2. **Creating a Google Drive Folder**: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.
   
3. **Uploading Completed Assignment**: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
   
4. **Sharing Folder Link**: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.
   
5. **Setting Permission toPublic**: Please make sure your **Google Drive folder is set to public**. This allows your instructor to access your solutions and assess your work correctly.

Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.

**Description:**

Welcome to your text/langchain portfolio project assignment for AI Bootcamp. In this project you will apply what you've learned so far into real-world applications.

You are free to create anything as long as it's within the category of text based application or model.

Some ideas for your applications:
* Langchain based summarization application that's given a PDF file or a text from a text box can then create a summary of the text content.
* Text generator application when given a direction in the form of short sentence can generate a 3 - 5 paragraph of text or more based on the direction.


For submission, please upload the model and application to Huggingface or your own Github account.

Your submission will be graded and scored and will add a maximum of 15 points to your final score.

Remember, the key to mastering these concepts is practice. Do apply your knowledge, and don't hesitate to ask questions if you encounter any difficulties. Good luck!

## Installation and Import `rggrader` Package

In [2]:
%pip install rggrader
from rggrader import submit_image
from rggrader import submit

Collecting rggrader
  Downloading rggrader-0.1.6-py3-none-any.whl (2.5 kB)
Installing collected packages: rggrader
Successfully installed rggrader-0.1.6


## Working Space

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
!pip install huggingface_hub

Collecting huggingface_hub
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface_hub
Successfully installed huggingface_hub-0.18.0


In [5]:
# Installing library
!pip install langchain openai python-dotenv
!pip install gradio

Collecting langchain
  Downloading langchain-0.0.316-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.1-py3-none-any.whl (27 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.1.0,>=0.0.43 (from langchain)
  Downloading langsmith-0.0.44-py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json

In [6]:
# Import library
from langchain.document_loaders import TextLoader, UnstructuredHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.vectorstores import FAISS
from dotenv import load_dotenv, find_dotenv
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
import os
import openai
import sys

In [7]:
!pip install pinecone-client tiktoken
!pip install faiss-cpu
!pip install transformers

Collecting pinecone-client
  Downloading pinecone_client-2.2.4-py3-none-any.whl (179 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.4/179.4 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
Collecting loguru>=0.5.0 (from pinecone-client)
  Downloading loguru-0.7.2-py3-none-any.whl (62 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Collecting dnspython>=2.0.0 (from pinecone-client)
  Downloading dnspython-2.4.2-py3-none-any.whl (300 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.4/300.4 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: loguru, dnspython, tiktoken, pinecone-client
[31mERROR: pip's dep

In [8]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [9]:
# Write your code here
loader = TextLoader("/content/AI_Text_Portfolio/animal-farm.txt")

In [10]:
from dotenv import load_dotenv, find_dotenv

# Get the value of a virtual environment variable
my_var = os.environ.get('OPENAI_API_KEY')

# Set the value of a virtual environment variable
os.environ['OPENAI_API_KEY'] = '...'

_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

In [11]:
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(docs, embeddings)

In [14]:
OpenAIModel = "gpt-3.5-turbo"
llm = ChatOpenAI(model=OpenAIModel, temperature=0.1)

qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())

In [15]:
import json

model_name = "gpt-3.5-turbo"  # Replace with your actual model name
model_name_config = {"model_name": model_name}

json_file_path = "/content/AI_Text_Portfolio/model_name_config.json"
with open(json_file_path, 'w') as json_file:
    json.dump(model_name_config, json_file)


In [16]:
query = "How many nights passed when old major die?"
qa.run(query)

'Three nights passed when old Major died.'

In [17]:
# Write your Gradio implementation here
import gradio as gr

def prompt(query):
    return qa.run(query)

iface = gr.Interface(prompt,'text','text')
iface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://80f0b81f8961ba9731.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




## Submit Notebook

In [55]:
portfolio_link = "https://github.com/AuChrist/AI-Text-Processor"

question_id = "01_text_portfolio_link"
submit(student_id, name, assignment_id, str(portfolio_link), question_id, drive_link)

'Assignment successfully submitted'

# FIN