# Wikipedia Company Profile APP

*   List item
*   List item



Take the company name as input. Extract the below company related details from Wikipedia:


*   The founder of the company.
*   When it was founded.
*   The current market capital of the company.
*   How many employees are working in it.
*   A brief 4-line summary of the company.

---
**Sample Output Below**

 {
    "Company_Name": "Tata Consultancy Services Limited",
    "Founder": "Tata Sons Limited",
    "Start_Date": "1968",
    "Revenue": 206858.05,
    "Employees": "601,546",
    "Summary": "TCS is an Indian multinational information technology (IT) services and consulting company headquartered in Mumbai. It is a part of the Tata Group and operates in 150 locations across 46 countries."
}


#Packages

In [1]:
!pip install openai --quiet
!pip install langchain --quiet
!pip install langchain-cohere
!pip install langchain-community --quiet

Collecting langchain-cohere
  Downloading langchain_cohere-0.4.4-py3-none-any.whl.metadata (6.6 kB)
Collecting cohere<6.0,>=5.12.0 (from langchain-cohere)
  Downloading cohere-5.15.0-py3-none-any.whl.metadata (3.4 kB)
Collecting langchain-community<0.4.0,>=0.3.0 (from langchain-cohere)
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting types-pyyaml<7.0.0.0,>=6.0.12.20240917 (from langchain-cohere)
  Downloading types_pyyaml-6.0.12.20250516-py3-none-any.whl.metadata (1.8 kB)
Collecting fastavro<2.0.0,>=1.9.4 (from cohere<6.0,>=5.12.0->langchain-cohere)
  Downloading fastavro-1.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.7 kB)
Collecting httpx-sse==0.4.0 (from cohere<6.0,>=5.12.0->langchain-cohere)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting types-requests<3.0.0,>=2.0.0 (from cohere<6.0,>=5.12.0->langchain-cohere)
  Downloading types_requests-2.32.0.20250515-py3-none-any.whl.metadata (2.1 

In [2]:
import os
#os.environ['OPENAI_API_KEY'] = "Your OpenAI Key"
#os.environ["COHERE_API_KEY"] = "Your Cohere Key"

#Better way
from google.colab import userdata

os.environ['COHERE_API_KEY'] = userdata.get("")

In [3]:
from langchain.llms import OpenAI, Cohere
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

In [4]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=16699dd765cefffe708ecb39c37aa0803328d3068259d6d9774209adb6faf694
  Stored in directory: /root/.cache/pip/wheels/8f/ab/cb/45ccc40522d3a1c41e1d2ad53b8f33a62f394011ec38cd71c6
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [5]:
from langchain.document_loaders import WikipediaLoader
loader=WikipediaLoader(query="TCS",  load_max_docs=1)
company_documents=loader.load()
print(company_documents)

[Document(metadata={'title': 'Tata Consultancy Services', 'summary': "Tata Consultancy Services (TCS) is an Indian multinational technology company specializing in information technology services and consulting. Headquartered in Mumbai, it is a part of the Tata Group and operates in 150 locations across 46 countries. As of 2024, Tata Sons owned 71.74% of TCS, and close to 80% of Tata Sons' dividend income came from TCS.\nIn September 2021, TCS recorded a market capitalization of US$200 billion, becoming the first Indian IT company to achieve this valuation. In 2012, it was the world's second-largest user of U.S. H-1B visas.\n\n", 'source': 'https://en.wikipedia.org/wiki/Tata_Consultancy_Services'}, page_content='Tata Consultancy Services (TCS) is an Indian multinational technology company specializing in information technology services and consulting. Headquartered in Mumbai, it is a part of the Tata Group and operates in 150 locations across 46 countries. As of 2024, Tata Sons owned 7

In [None]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class CompanyProfile(BaseModel):
  Company_Name: str= Field(description= "The Name of the Company")
  Founder: str= Field(description= "The Founder of the Company")
  Start_Date: str= Field(description= "The date or the founding year of the compnay")
  Revenue: int= Field(description= "The Revenue of the company")
  Employees: str= Field(description= "How many employees are working in it")
  Summary: str= Field(description= "Provide a brief 4-line summary of the company")

custom_output_parser= PydanticOutputParser(pydantic_object=CompanyProfile)
print(custom_output_parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"Company_Name": {"description": "The Name of the Company", "title": "Company Name", "type": "string"}, "Founder": {"description": "The Founder of the Company", "title": "Founder", "type": "string"}, "Start_Date": {"description": "The date or the founding year of the compnay", "title": "Start Date", "type": "string"}, "Revenue": {"description": "The Revenue of the company", "title": "Revenue", "type": "integer"}, "Employees": {"description": "How many employees are working in it", "title": "Employees", "type": "string"}, "Summar

In [None]:
template="""
Take the company wiki page information as input
Company Details from Wikipedia:{wiki_page_info}
{format_instructions}
"""
prompt=PromptTemplate(template=template,
                      input_variables=["wiki_page_info","format_instructions"])

#llm=OpenAI(temperature=0)
llm=Cohere()

chain=LLMChain(prompt=prompt,
               llm=llm)

result=chain.invoke({"wiki_page_info":company_documents,
              "format_instructions":custom_output_parser.get_format_instructions()})
print(result["text"])

 {
    "Company_Name": "Tata Consultancy Services Limited",
    "Founder": "Tata Sons Limited",
    "Start_Date": "1968",
    "Revenue": 206858.05,
    "Employees": "601,546",
    "Summary": "TCS is an Indian multinational information technology (IT) services and consulting company headquartered in Mumbai. It is a part of the Tata Group and operates in 150 locations across 46 countries."
} 
