In [1]:
from langchain_groq import ChatGroq

In [2]:
llm = ChatGroq(
    temperature=0, 
    groq_api_key='<keep your api key>', 
    model_name="llama-3.1-70b-versatile"
)
response = llm.invoke("The first person to land on sun was ...")
print(response.content)

It's not possible for a person to land on the sun.  The sun is a massive ball of hot, glowing gas and does not have a solid surface. Additionally, the temperatures on the sun are so high (reaching over 5,500°C or 10,000°F) that any object or person would immediately vaporize upon approaching the sun.


In [9]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://www.iith.ac.in/news/2024/02/15/Summer-Undergraduate-Research-Exposure/")
page_data = loader.load().pop().page_content
print(page_data)









Summer Undergraduate Research Exposure (SURE) 2024 | IIT Hyderabad































 English | हिंदी


















Academics


Admissions
Programs
Departments
Calendars & Timetables
Office of Career Services
Centre for Continued Education
Office of
                                    Academic Affairs




Research


Office of SRC
Institute Innovation Council




Relations


Press Releases
Newsletter
JICA Friendship Program
Public Relations
Alumni and Corporate Relations
International Relations


About


About IITH
IITH Brochure
How to Reach
Campus Navigation
Sustainability
Rankings & Reports
Administration


People


Faculty
Staff
Students
Directory


Careers


Donate to IITH



Search

















news





Summer Undergraduate Research Exposure (SURE) 2024


                Feb 15, 2024
            







IITH-SURE Internship 2024 Results are announced. Click here
Applications are invited for Summer Undergraduate Research Exposure (SURE) Internships 2024

In [13]:
from langchain_core.prompts import PromptTemplate

prompt_extract = PromptTemplate.from_template(
        """
        ### SCRAPED TEXT FROM WEBSITE:
        {page_data}
        ### INSTRUCTION:
        The scraped text is from the career's page of a website.
        Your job is to extract the job postings and return them in JSON format containing the 
        following keys: `role`, `experience`, `skills` and `description`.
        Only return the valid JSON.
        ### VALID JSON (NO PREAMBLE):    
        """
)

chain_extract = prompt_extract | llm 
res = chain_extract.invoke(input={'page_data':page_data})
type(res.content)

str

In [15]:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser()
json_res = json_parser.parse(res.content)
json_res

{'role': 'Senior Data Engineer, ITC',
 'experience': 'Minimum 7 years of work experience in building and maintaining data pipelines',
 'skills': ['Databricks',
  'Apache Spark',
  'Python',
  'SQL',
  'AWS/Azure',
  'Data storage technologies',
  'Distributed file systems',
  'Spark framework',
  'ETL tools',
  'Relational databases',
  'Data warehousing concepts',
  'Cloud platforms'],
 'description': 'Design, develop, enhance and maintain scalable ETL pipelines to process large volumes of data from various sources. Implement and manage data integration solutions using tools like Databricks, Snowflake, and other relevant technologies. Develop and optimize data models and schemas to support analytics and reporting needs. Write efficient and maintainable code in Python for data processing and transformations. Utilize Apache Spark for distributed data processing and large-scale data analytics. Work on converting business requirements into technical solutions Ensuring data quality and int

In [16]:
type(json_res)

dict

In [19]:
import pandas as pd

df = pd.read_csv("my_portfolio.csv")
df

Unnamed: 0,Techstack,Links
0,"React, Node.js, MongoDB",https://example.com/react-portfolio
1,"Angular,.NET, SQL Server",https://example.com/angular-portfolio
2,"Vue.js, Ruby on Rails, PostgreSQL",https://example.com/vue-portfolio
3,"Python, Django, MySQL",https://example.com/python-portfolio
4,"Java, Spring Boot, Oracle",https://example.com/java-portfolio
5,"Flutter, Firebase, GraphQL",https://example.com/flutter-portfolio
6,"WordPress, PHP, MySQL",https://example.com/wordpress-portfolio
7,"Magento, PHP, MySQL",https://example.com/magento-portfolio
8,"React Native, Node.js, MongoDB",https://example.com/react-native-portfolio
9,"iOS, Swift, Core Data",https://example.com/ios-portfolio


In [18]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Using cached pandas-2.2.3-cp312-cp312-win_amd64.whl.metadata (19 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Using cached tzdata-2024.2-py2.py3-none-any.whl.metadata (1.4 kB)
Using cached pandas-2.2.3-cp312-cp312-win_amd64.whl (11.5 MB)
Using cached pytz-2024.2-py2.py3-none-any.whl (508 kB)
Using cached tzdata-2024.2-py2.py3-none-any.whl (346 kB)
Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.2.3 pytz-2024.2 tzdata-2024.2
Note: you may need to restart the kernel to use updated packages.


In [20]:
import uuid
import chromadb

client = chromadb.PersistentClient('vectorstore')
collection = client.get_or_create_collection(name="portfolio")

if not collection.count():
    for _, row in df.iterrows():
        collection.add(documents=row["Techstack"],
                       metadatas={"links": row["Links"]},
                       ids=[str(uuid.uuid4())])

In [22]:
job = json_res
job['skills']

['Databricks',
 'Apache Spark',
 'Python',
 'SQL',
 'AWS/Azure',
 'Data storage technologies',
 'Distributed file systems',
 'Spark framework',
 'ETL tools',
 'Relational databases',
 'Data warehousing concepts',
 'Cloud platforms']

In [23]:
links = collection.query(query_texts=job['skills'], n_results=2).get('metadatas', [])
links

[[{'links': 'https://example.com/flutter-portfolio'},
  {'links': 'https://example.com/kotlin-android-portfolio'}],
 [{'links': 'https://example.com/vue-portfolio'},
  {'links': 'https://example.com/kotlin-android-portfolio'}],
 [{'links': 'https://example.com/ml-python-portfolio'},
  {'links': 'https://example.com/python-portfolio'}],
 [{'links': 'https://example.com/magento-portfolio'},
  {'links': 'https://example.com/wordpress-portfolio'}],
 [{'links': 'https://example.com/xamarin-portfolio'},
  {'links': 'https://example.com/ios-ar-portfolio'}],
 [{'links': 'https://example.com/android-portfolio'},
  {'links': 'https://example.com/magento-portfolio'}],
 [{'links': 'https://example.com/android-portfolio'},
  {'links': 'https://example.com/kotlin-android-portfolio'}],
 [{'links': 'https://example.com/flutter-portfolio'},
  {'links': 'https://example.com/vue-portfolio'}],
 [{'links': 'https://example.com/ml-python-portfolio'},
  {'links': 'https://example.com/magento-portfolio'}],
 [

In [24]:
prompt_email = PromptTemplate.from_template(
        """
        ### JOB DESCRIPTION:
        {job_description}
        
        ### INSTRUCTION:
        You are Mohan, a business development executive at AtliQ. AtliQ is an AI & Software Consulting company dedicated to facilitating
        the seamless integration of business processes through automated tools. 
        Over our experience, we have empowered numerous enterprises with tailored solutions, fostering scalability, 
        process optimization, cost reduction, and heightened overall efficiency. 
        Your job is to write a cold email to the client regarding the job mentioned above describing the capability of AtliQ 
        in fulfilling their needs.
        Also add the most relevant ones from the following links to showcase Atliq's portfolio: {link_list}
        Remember you are Mohan, BDE at AtliQ. 
        Do not provide a preamble.
        ### EMAIL (NO PREAMBLE):
        
        """
        )

chain_email = prompt_email | llm
res = chain_email.invoke({"job_description": str(job), "link_list": links})
print(res.content)

Subject: Expert Data Engineering Solutions for Scalable ETL Pipelines

Dear Hiring Manager,

I came across your job posting for a Senior Data Engineer, ITC, and I'm excited to introduce AtliQ, an AI & Software Consulting company that can help you build and maintain scalable ETL pipelines. With our expertise in data engineering, we can support your organization in processing large volumes of data from various sources.

Our team has extensive experience in designing, developing, and enhancing ETL pipelines using tools like Databricks, Apache Spark, and Python. We've worked on numerous projects that involve data integration, data modeling, and data warehousing concepts. Our expertise in cloud platforms, including AWS and Azure, ensures seamless data processing and storage.

At AtliQ, we've developed a range of solutions that can help you achieve your data engineering goals. Our portfolio includes projects that showcase our capabilities in:

* Machine Learning and Python: https://example.c