## Exercises

By the end of this exercise, you should have a file named acquire.py that contains the specified functions. If you wish, you may break your work into separate files for each website (e.g. acquire_codeup_blog.py and acquire_news_articles.py), but the end function should be present in acquire.py (that is, acquire.py should import get_blog_articles from the acquire_codeup_blog module.)

Codeup Blog Articles

Visit Codeup's Blog and record the urls for at least 5 distinct blog posts. For each post, you should scrape at least the post's title and content.

Encapsulate your work in a function named get_blog_articles that will return a list of dictionaries, with each dictionary representing one article. The shape of each dictionary should look like this:


{
    'title': 'the title of the article',
    'content': 'the full text content of the article'
}


Plus any additional properties you think might be helpful.

Bonus: Scrape the text of all the articles linked on codeup's blog page.

News Articles

We will now be scraping text data from inshorts, a website that provides a brief overview of many different topics.

Write a function that scrapes the news articles for the following topics:

- Business
- Sports
- Technology
- Entertainment

The end product of this should be a function named get_news_articles that returns a list of dictionaries, where each dictionary has this shape:


{
    'title': 'The article title',
    'content': 'The article content',
    'category': 'business' # for example
}


Hints:

Start by inspecting the website in your browser. Figure out which elements will be useful.

Start by creating a function that handles a single article and produces a dictionary like the one above.

Next create a function that will find all the articles on a single page and call the function you created in the last step for every article on the page.

Now create a function that will use the previous two functions to scrape the articles from all the pages that you need, and do any additional processing that needs to be done.


Bonus: cache the data

Write your code such that the acquired data is saved locally in some form or fashion. Your functions that retrieve the data should prefer to read the local data instead of having to make all the requests everytime the function is called. Include a boolean flag in the functions to allow the data to be acquired "fresh" from the actual sources (re-writing your local cache).

In [228]:
import pandas as pd
import numpy as np
from requests import get
from bs4 import BeautifulSoup
import re

In [17]:
url = 'https://codeup.com/tips-for-prospective-students/tips-for-women/'
url2 = 'https://codeup.com/codeup-news/dei-report/'
url3 = 'https://codeup.com/cloud-administration/cloud-computing-and-aws/'
url4 = 'https://codeup.com/featured/financing-career-transition/'
url5 = 'https://codeup.com/codeup-news/diversity-and-inclusion-award/'

In [18]:
headers = {'User-Agent': 'Codeup Data Science'} # Some websites don't accept the pyhon-requests default user-agent
response = get(url, headers=headers)

In [100]:
soup = BeautifulSoup(response.content, 'html.parser')

In [101]:
soup.title.text.strip()

'Tips for Women Beginning a Career in Tech - Codeup'

In [102]:
soup.find_all('p')

[<p class="post-meta"><span class="published">Sep 23, 2022</span> | <a href="https://codeup.com/category/tips-for-prospective-students/" rel="category tag">Tips for Prospective Students</a></p>,
 <p>Codeup strongly values diversity, and inclusion. In honor of <a href="https://nationaldaycalendar.com/american-business-womens-day-september-22/" rel="noopener" target="_blank">American Business Women’s Day</a>, we’d like to share eight tips and pieces of advice from Codeup’s women in tech for women looking to begin their careers in tech!</p>,
 <p>Codeup works hard to close the gender inequality gap, and to diversify the tech world by producing a unique blend of tech talent. We also offer a <a href="https://codeup.com/women/">Women in Tech scholarship</a> open to anyone who identifies as a woman. Our goal is to make a career in tech accessible to all.</p>,
 <p>Join us on our journey of empowering women in tech today! <a href="https://codeup.com/apply-now/">Apply</a> or <a href="https://code

In [112]:
soup.find_all('p')[1:4]

[<p>Codeup strongly values diversity, and inclusion. In honor of <a href="https://nationaldaycalendar.com/american-business-womens-day-september-22/" rel="noopener" target="_blank">American Business Women’s Day</a>, we’d like to share eight tips and pieces of advice from Codeup’s women in tech for women looking to begin their careers in tech!</p>,
 <p>Codeup works hard to close the gender inequality gap, and to diversify the tech world by producing a unique blend of tech talent. We also offer a <a href="https://codeup.com/women/">Women in Tech scholarship</a> open to anyone who identifies as a woman. Our goal is to make a career in tech accessible to all.</p>,
 <p>Join us on our journey of empowering women in tech today! <a href="https://codeup.com/apply-now/">Apply</a> or <a href="https://codeup.com/moreinfo/">request more information</a> on our programs to jumpstart your career in tech.</p>]

In [114]:
soup = BeautifulSoup(get(url2, headers=headers).content, 'html.parser')

In [117]:
soup.find_all('p')[1:4].

[<p>Codeup is excited to launch our first Diversity Equity, and Inclusion (DEI) report! In over eight years as an organization, we’ve implemented policies and grown our DEI efforts. We are extremely proud of the progress we’ve made as a staff and Codeup community, and we recognize there is more to learn. This report captures some of the ways that we’ve lived our value of Cultivating Inclusive Growth, and how we will continue doing so as we look to the future.</p>,
 <p>We wanted to shine a light on the demographics of our students and staff, and in particular how that compares to the tech industry as a whole. How we collect, organize, and share employee demographic data is informed by standards set by the <a href="https://www.eeoc.gov/" rel="noopener" target="_blank">Equal Employment Opportunity Commission (EEOC)</a>.</p>,
 <p>We are proud to celebrate how we’ve grown and are motivated and committed to do more and be better. To view the report visit the link <a href="https://2817329.fs1

In [31]:
urls = ['https://codeup.com/tips-for-prospective-students/tips-for-women/', 'https://codeup.com/codeup-news/dei-report/',
       'https://codeup.com/cloud-administration/cloud-computing-and-aws/', 'https://codeup.com/featured/financing-career-transition/',
       'https://codeup.com/codeup-news/diversity-and-inclusion-award/']

In [125]:
def get_content(urls):
    articles = [{}]
    for url in urls:
        headers = {'User-Agent': 'Codeup Data Science'}
        response = get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        title = soup.title.text.strip()
        p1 = soup.find_all('p')[1].text.strip()
        p2 = soup.find_all('p')[2].text.strip()
        p3 = soup.find_all('p')[3].text.strip()
        p4 = soup.find_all('p')[4].text.strip()
        content = p1 + p2 + p3 + p4
        articles.append({'title':title, 'content':content})
    return articles
        

In [126]:
articles = get_content(urls)

In [127]:
articles

[{},
 {'title': 'Tips for Women Beginning a Career in Tech - Codeup',
  'content': 'Codeup strongly values diversity, and inclusion. In honor of American Business Women’s Day, we’d like to share eight tips and pieces of advice from Codeup’s women in tech for women looking to begin their careers in tech!Codeup works hard to close the gender inequality gap, and to diversify the tech world by producing a unique blend of tech talent. We also offer a Women in Tech scholarship open to anyone who identifies as a woman. Our goal is to make a career in tech accessible to all.Join us on our journey of empowering women in tech today! Apply or request more information on our programs to jumpstart your career in tech.'},
 {'title': 'Diversity Equity and Inclusion Report - Codeup',
  'content': 'Codeup is excited to launch our first Diversity Equity, and Inclusion (DEI) report! In over eight years as an organization, we’ve implemented policies and grown our DEI efforts. We are extremely proud of the

In [128]:
articles = pd.DataFrame(articles)

In [130]:
articles = articles.drop([0])

In [131]:
articles

Unnamed: 0,title,content
1,Tips for Women Beginning a Career in Tech - Co...,"Codeup strongly values diversity, and inclusio..."
2,Diversity Equity and Inclusion Report - Codeup,Codeup is excited to launch our first Diversit...
3,What is Cloud Computing and AWS? - Codeup,With many companies switching to cloud service...
4,How Can I Finance My Career Transition? - Codeup,Deciding to transition into a tech career is a...
5,Codeup Honored as SABJ Diversity and Inclusion...,Codeup has been named the 2022 Diversity and I...


In [132]:
articles.to_csv('articles.csv')

In [257]:
business = 'https://inshorts.com/en/read/business'

In [258]:
soup = BeautifulSoup(get(business).content, 'html.parser')

In [259]:
business_list = soup.select('div.container')[0].text.split('\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n')

In [260]:
del business_list[0]

In [261]:
business_list[0].replace('\n', '').replace('      ', '')

"HCLTech adds highest-ever 10,339 freshers, reports 23.8% attrition in Jul-Septshort by Ashley Paul / 09:07 pm on 12 Oct 2022,WednesdayHCLTech reported its highest-ever hiring of 10,339 freshers in July-September quarter, MD and CEO C Vijayakumar said on Wednesday. HCLTech's attrition rate in the quarter stood at 23.8%, same as that in April-June quarter. Meanwhile, the firm reported net employee addition of 8,359 employees in Q2 FY23, up from net employee addition of 2,089 employees in Q1 FY23.short by Ashley Paul / 09:07 pm on 12 Octread more at Moneycontrol"

In [262]:
business_list[0] = business_list[0].replace('\n', '').replace('      ', '')

In [284]:
article_re = r'(?P<title>.*)short by\s(?P<Author>\w+\s\w+)\s/\s(?P<date>.*2022)\,Wednesday(?P<content>.*)short'

re.findall(article_re, business_list[0])

[('HCLTech adds highest-ever 10,339 freshers, reports 23.8% attrition in Jul-Sept',
  'Ashley Paul',
  '09:07 pm on 12 Oct 2022',
  "HCLTech reported its highest-ever hiring of 10,339 freshers in July-September quarter, MD and CEO C Vijayakumar said on Wednesday. HCLTech's attrition rate in the quarter stood at 23.8%, same as that in April-June quarter. Meanwhile, the firm reported net employee addition of 8,359 employees in Q2 FY23, up from net employee addition of 2,089 employees in Q1 FY23.")]

In [285]:
business_list[0]

"HCLTech adds highest-ever 10,339 freshers, reports 23.8% attrition in Jul-Septshort by Ashley Paul / 09:07 pm on 12 Oct 2022,WednesdayHCLTech reported its highest-ever hiring of 10,339 freshers in July-September quarter, MD and CEO C Vijayakumar said on Wednesday. HCLTech's attrition rate in the quarter stood at 23.8%, same as that in April-June quarter. Meanwhile, the firm reported net employee addition of 8,359 employees in Q2 FY23, up from net employee addition of 2,089 employees in Q1 FY23.short by Ashley Paul / 09:07 pm on 12 Octread more at Moneycontrol"

In [286]:
business_list[1] = business_list[1].replace('\n', '').replace('      ', '')

In [288]:
business_list[1]

"11 lakh railway employees to get 78 days' wages as productivity bonusshort by Dharini Mudgal / 07:23 pm on 12 Oct 2022,WednesdayUnion Minister Anurag Thakur on Wednesday announced that the Centre has approved a productivity-linked bonus equivalent to the wage of 78 days for eligible non-gazetted railway employees. This will benefit more than 11 lakh non-gazetted railway employees and it will cost the government approximately ₹1,832 crore. The maximum amount payable per eligible railway employee is ₹17,951 for 78 days.short by Dharini Mudgal / 07:23 pm on 12 Oct"

In [287]:
re.findall(article_re, business_list[1])

[("11 lakh railway employees to get 78 days' wages as productivity bonus",
  'Dharini Mudgal',
  '07:23 pm on 12 Oct 2022',
  'Union Minister Anurag Thakur on Wednesday announced that the Centre has approved a productivity-linked bonus equivalent to the wage of 78 days for eligible non-gazetted railway employees. This will benefit more than 11 lakh non-gazetted railway employees and it will cost the government approximately ₹1,832 crore. The maximum amount payable per eligible railway employee is ₹17,951 for 78 days.')]

In [303]:
def get_news_articles(url):
    articles = []
    soup = BeautifulSoup(get(url).content, 'html.parser')
    article_list = soup.select('div.container')[0].text.split('\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n')
    del article_list[0]
    for article in article_list:
        article = article.replace('\n', '').replace('      ', '')
        article_re = r'(?P<title>.*)short by\s(?P<Author>\w+\s\w+)\s/\s(?P<date>.*2022)\,Wednesday(?P<content>.*)short'
        articles.append(re.findall(article_re, article))
        articles = [i for i in articles if i]
        #title, author, date, content = article_mashed
        #articles.append({'title':title, 'author':author, 'date':date, 'content':content})
    return articles

In [305]:
articles = get_news_articles(business)

In [309]:
columns = ['title','author','date','content']

In [329]:
def news_dict(articles):
    art_dict = [{}]
    for article in articles:
    #dict(zip(columns, article[0]))
        art_dict.append(dict(zip(columns, article[0])))
    return art_dict

In [334]:
news_df = pd.DataFrame(news_dict(articles)).drop(0)
news_df['category'] = 'business'
news_df

Unnamed: 0,title,author,date,content,category
1,"HCLTech adds highest-ever 10,339 freshers, rep...",Ashley Paul,09:07 pm on 12 Oct 2022,HCLTech reported its highest-ever hiring of 10...,business
2,11 lakh railway employees to get 78 days' wage...,Dharini Mudgal,07:23 pm on 12 Oct 2022,Union Minister Anurag Thakur on Wednesday anno...,business
3,Retail inflation averages at 7.02% in Jul-Sept...,Ashley Paul,07:53 pm on 12 Oct 2022,Retail inflation in India averaged at 7.02% in...,business
4,"Moonlighting a question of ethics, not legalit...",Anmol Sharma,11:14 pm on 12 Oct 2022,On being asked whether moonlighting is legal o...,business
5,Wipro to give 100% variable pay to 85% of empl...,Dharini Mudgal,09:01 pm on 12 Oct 2022,IT services firm Wipro's CEO Thierry Delaporte...,business
6,"Mercedes-Benz, Microsoft collaborate to improv...",Anisha Joneja,09:26 pm on 12 Oct 2022,Mercedes-Benz and Microsoft have collaborated ...,business
7,"MobiKwik raises more debt, changes ESOP policy...",Anisha Joneja,07:19 pm on 12 Oct 2022,MobiKwik has raised total debt of ₹55 crore si...,business
8,India-UK FTA on 'verge of collapse' over visa ...,Anisha Joneja,11:13 pm on 12 Oct 2022,After UK Home Secretary Suella Braverman quest...,business
9,Govt scraps privatisation of Bhadravati steel ...,Ambarish Awale,10:23 pm on 12 Oct 2022,The Department of Investment and Public Asset ...,business
10,"LIC sells over 2% stake in Power Grid for ₹3,0...",Dharini Mudgal,10:34 pm on 12 Oct 2022,Life Insurance Corporation of India (LIC) has ...,business


In [336]:
url = 'https://inshorts.com/en/read/sports'

In [337]:
sports = get_news_articles(url)

In [339]:
sports_dict = news_dict(sports)

In [340]:
sports_dict

[{},
 {'title': "Ben Stokes' flying effort at boundary rope to save six goes viral",
  'author': 'Anmol Sharma',
  'date': '10:52 pm on 12 Oct 2022',
  'content': 'A video has gone viral showing England all-rounder Ben Stokes\' acrobatic effort to save a shot from Australia\'s Mitchell Marsh from going for a six at long-off during the second T20I in Canberra. Reacting to the video, a fan tweeted, "How he did that so brilliantly!" Another fan wrote, "Unbelievable athleticism!" "His reaction time is outstanding," another tweet read.'},
 {'title': 'Mohammad Shami confirms he is going to Australia ahead of T20 WC, shares pics from flight',
  'author': 'Anmol Sharma',
  'date': '10:12 pm on 12 Oct 2022',
  'content': 'Pacer Mohammad Shami on Wednesday confirmed that he is going to Australia ahead of the T20 World Cup 2022. Sharing pictures from a flight on Instagram, the 32-year-old pacer wrote, "Time to fly now for T20 World Cup." Shami is reportedly in contention to replace the vacant spo

In [341]:
def news_df(news, category):
    news = pd.DataFrame(news).drop(0)
    news['category'] = category
    return news

In [343]:
sports_df = news_df(sports_dict, 'sports')

In [344]:
sports_df

Unnamed: 0,title,author,date,content,category
1,Ben Stokes' flying effort at boundary rope to ...,Anmol Sharma,10:52 pm on 12 Oct 2022,A video has gone viral showing England all-rou...,sports
2,Mohammad Shami confirms he is going to Austral...,Anmol Sharma,10:12 pm on 12 Oct 2022,Pacer Mohammad Shami on Wednesday confirmed th...,sports
3,India's national record-holding discus thrower...,Anmol Sharma,10:48 pm on 12 Oct 2022,"Kamalpreet Kaur, who holds national record in ...",sports
4,ICC names 2 strike bowlers for every team at T...,Anmol Sharma,01:53 pm on 12 Oct 2022,ICC has named two strike bowlers for every tea...,sports
5,Who are the top 5 batters as per latest T20I r...,Anmol Sharma,11:13 pm on 12 Oct 2022,New Zealand wicketkeeper-batter Devon Conway l...,sports
6,India players celebrate Hardik Pandya's birthd...,Anmol Sharma,11:23 pm on 12 Oct 2022,Several Team India players including Dinesh Ka...,sports
7,"Ambati Rayudu, Sheldon Jackson involved in fig...",Anmol Sharma,11:11 pm on 12 Oct 2022,A video has gone viral showing Baroda's Ambati...,sports
8,Ganguly's removal as BCCI Prez another example...,Nakul Ahuja,03:37 pm on 12 Oct 2022,TMC on Tuesday said that Sourav Ganguly's remo...,sports
9,How does the final medal tally of 36th Nationa...,Anmol Sharma,08:51 pm on 12 Oct 2022,The 36th National Games came to a conclusion o...,sports
10,ICC names 5 changes to playing conditions to k...,Anmol Sharma,09:51 pm on 12 Oct 2022,ICC named five changes to playing conditions t...,sports


In [345]:
url = 'https://inshorts.com/en/read/technology'

In [346]:
tech = get_news_articles(url)

In [347]:
tech_dict = news_dict(tech)

In [348]:
tech_df = news_df(tech_dict, 'technology')

In [349]:
tech_df

Unnamed: 0,title,author,date,content,category
1,"Robot addresses UK Parliament, says 'Although ...",Apaar Sharma,11:54 am on 12 Oct 2022,A 'robot artist' called Ai-Da interacted with ...,technology
2,2-seater Chinese electric flying car makes 1st...,Ridham Gambhir,09:13 am on 12 Oct 2022,An electric flying car built by the Chinese el...,technology
3,Adani Data Networks gets licence to offer tele...,Ridham Gambhir,10:44 am on 12 Oct 2022,Adani Data Networks has reportedly been grante...,technology
4,Intel plans to cut thousands of jobs amid PC m...,Ridham Gambhir,08:44 am on 12 Oct 2022,Chipmaker Intel is planning to cut thousands o...,technology
5,"HCLTech adds highest-ever 10,339 freshers, rep...",Ashley Paul,09:07 pm on 12 Oct 2022,HCLTech reported its highest-ever hiring of 10...,technology
6,"Wipro's net employee addition falls to 605, at...",Ashley Paul,05:20 pm on 12 Oct 2022,IT services firm Wipro on Wednesday said its n...,technology
7,TCS staff seeking WFH on medical grounds refer...,Ridham Gambhir,10:20 am on 12 Oct 2022,TCS employees seeking exemption from work from...,technology
8,"Facebook users complain of losing followers, Z...",Ridham Gambhir,03:10 pm on 12 Oct 2022,Several Facebook users have complained about l...,technology
9,Apple to roll out 5G software update in India ...,Ridham Gambhir,02:25 pm on 12 Oct 2022,Apple on Wednesday said that it will start upg...,technology
10,Crypto firm Blockchain.com gets Singapore lice...,Purnima Rajput,05:01 pm on 12 Oct 2022,"Blockchain.com, a cryptocurrency exchange back...",technology


In [350]:
url = 'https://inshorts.com/en/read/entertainment'

In [351]:
entertain = get_news_articles(url)

In [352]:
ent_dict = news_dict(entertain)

In [354]:
ent_df = news_df(ent_dict, 'entertainment')

In [355]:
ent_df

Unnamed: 0,title,author,date,content,category
1,"Beliefs hurt by distorting traditions, Aamir s...",Apaar Sharma,04:45 pm on 12 Oct 2022,Madhya Pradesh Home Minister Narottam Mishra h...,entertainment
2,"Ranveer Singh, Kiara Advani win Maharashtrian ...",Apaar Sharma,03:24 pm on 12 Oct 2022,Ranveer Singh and Kiara Advani were honoured a...,entertainment
3,Not true: 'Taarak Mehta...' fame Disha's broth...,Ankush Verma,01:59 pm on 12 Oct 2022,"The brother of actress Disha Vakani, known for...",entertainment
4,Can't you become bhaijaan to wronged women: Sh...,Daisy Mowke,06:24 pm on 12 Oct 2022,"Sherlyn Chopra, who had accused filmmaker Saji...",entertainment
5,American Idol runner-up Willie Spence dies in ...,Anmol Sharma,07:53 pm on 12 Oct 2022,"Singer Willie Spence, who finished runner-up i...",entertainment
6,Shah Rukh Khan books 5-star hotel rooms for fa...,Daisy Mowke,08:36 pm on 12 Oct 2022,Actor Shah Rukh Khan recently booked 5-star ho...,entertainment
7,American Idol fame Willie shared video of him ...,Anmol Sharma,09:47 pm on 12 Oct 2022,"Singer Willie Spence, who passed away in a car...",entertainment
8,Shouldn't boycott film when govt is doing best...,Amartya Sharma,09:37 pm on 12 Oct 2022,Actress Rakul Preet Singh said the film indust...,entertainment
9,"Couldn't finish watching 'Bhool Bhulaiyaa 2', ...",Amartya Sharma,07:12 pm on 12 Oct 2022,Actress Katrina Kaif has said that she couldn'...,entertainment
10,'Squid Game' actor Anupam Tripathi meets Anura...,Amartya Sharma,10:41 pm on 12 Oct 2022,"Actor Anupam Tripathi, who played the role of ...",entertainment


In [357]:
df = pd.concat([sports_df, tech_df, ent_df])

In [358]:
df

Unnamed: 0,title,author,date,content,category
1,Ben Stokes' flying effort at boundary rope to ...,Anmol Sharma,10:52 pm on 12 Oct 2022,A video has gone viral showing England all-rou...,sports
2,Mohammad Shami confirms he is going to Austral...,Anmol Sharma,10:12 pm on 12 Oct 2022,Pacer Mohammad Shami on Wednesday confirmed th...,sports
3,India's national record-holding discus thrower...,Anmol Sharma,10:48 pm on 12 Oct 2022,"Kamalpreet Kaur, who holds national record in ...",sports
4,ICC names 2 strike bowlers for every team at T...,Anmol Sharma,01:53 pm on 12 Oct 2022,ICC has named two strike bowlers for every tea...,sports
5,Who are the top 5 batters as per latest T20I r...,Anmol Sharma,11:13 pm on 12 Oct 2022,New Zealand wicketkeeper-batter Devon Conway l...,sports
6,India players celebrate Hardik Pandya's birthd...,Anmol Sharma,11:23 pm on 12 Oct 2022,Several Team India players including Dinesh Ka...,sports
7,"Ambati Rayudu, Sheldon Jackson involved in fig...",Anmol Sharma,11:11 pm on 12 Oct 2022,A video has gone viral showing Baroda's Ambati...,sports
8,Ganguly's removal as BCCI Prez another example...,Nakul Ahuja,03:37 pm on 12 Oct 2022,TMC on Tuesday said that Sourav Ganguly's remo...,sports
9,How does the final medal tally of 36th Nationa...,Anmol Sharma,08:51 pm on 12 Oct 2022,The 36th National Games came to a conclusion o...,sports
10,ICC names 5 changes to playing conditions to k...,Anmol Sharma,09:51 pm on 12 Oct 2022,ICC named five changes to playing conditions t...,sports


In [359]:
url = 'https://inshorts.com/en/read/business'

In [360]:
business = get_news_articles(url)

In [361]:
bus_dict = news_dict(business)

In [363]:
bus_df = news_df(bus_dict, 'business')

In [364]:
df = pd.concat([df, bus_df])

In [370]:
df.head(50)

Unnamed: 0,title,author,date,content,category
1,Ben Stokes' flying effort at boundary rope to ...,Anmol Sharma,10:52 pm on 12 Oct 2022,A video has gone viral showing England all-rou...,sports
2,Mohammad Shami confirms he is going to Austral...,Anmol Sharma,10:12 pm on 12 Oct 2022,Pacer Mohammad Shami on Wednesday confirmed th...,sports
3,India's national record-holding discus thrower...,Anmol Sharma,10:48 pm on 12 Oct 2022,"Kamalpreet Kaur, who holds national record in ...",sports
4,ICC names 2 strike bowlers for every team at T...,Anmol Sharma,01:53 pm on 12 Oct 2022,ICC has named two strike bowlers for every tea...,sports
5,Who are the top 5 batters as per latest T20I r...,Anmol Sharma,11:13 pm on 12 Oct 2022,New Zealand wicketkeeper-batter Devon Conway l...,sports
6,India players celebrate Hardik Pandya's birthd...,Anmol Sharma,11:23 pm on 12 Oct 2022,Several Team India players including Dinesh Ka...,sports
7,"Ambati Rayudu, Sheldon Jackson involved in fig...",Anmol Sharma,11:11 pm on 12 Oct 2022,A video has gone viral showing Baroda's Ambati...,sports
8,Ganguly's removal as BCCI Prez another example...,Nakul Ahuja,03:37 pm on 12 Oct 2022,TMC on Tuesday said that Sourav Ganguly's remo...,sports
9,How does the final medal tally of 36th Nationa...,Anmol Sharma,08:51 pm on 12 Oct 2022,The 36th National Games came to a conclusion o...,sports
10,ICC names 5 changes to playing conditions to k...,Anmol Sharma,09:51 pm on 12 Oct 2022,ICC named five changes to playing conditions t...,sports


In [367]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 84 entries, 1 to 25
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   title     84 non-null     object
 1   author    84 non-null     object
 2   date      84 non-null     object
 3   content   84 non-null     object
 4   category  84 non-null     object
dtypes: object(5)
memory usage: 3.9+ KB


In [369]:
df.category.value_counts()

business         25
sports           22
entertainment    20
technology       17
Name: category, dtype: int64