# Bangalore Coders Data Analysis

You ask Sam Altman for a 2.2 billion USD in funding and he makes fun of you in public.

He asks you where you live and finally, after talking to you, gives you a challenging task.

Collect raw Instagram data of all OpenAI followers. 

Answer these questions:

- Who has maximum posts
- Who has maximum followers
- Who follows maximum people
- How many categories (digital creators, non-profit foundations, etc.) of how many people do we have?

You have 24 hours.

You then ask for data ‚Äî he laughs!

# Data Collection

You hire Vijayalakshmi Iyer and Sam Chandra, who are from Hebbal and HSR Layout respectively.

HSR Layout and Hebbal are far away, so you set up a meeting at Rameshwaram Cafe, Indiranagar.

In [1]:
# Step 1: Read file
with open("initialdata.txt", encoding="utf-8") as f:
    data = f.read()

In [2]:
# Step 2: Split into chunks
chunks = []

parts = data.split("\n\n")

for part in parts:
    if len(part) > 3:
        chunks.append(part)

In [3]:
chunks

['intaglobal\n1,946 posts\n6,851 followers\n262 following\nINTA\nNonprofit organization\nThe association of ‚Ñ¢ professionals.\ninta.campsite.bio',
 '_anujsinghal\n1,785 posts\n681K followers\n248 following\nAnuj Singhal\nDigital creator\nManaging Editor, CNBC-Awaaz. Most trusted and followed biz anchor in India\nLet‚Äôs talk about finance and life\nContact: Singhalanuj1010@gmail.com\nhindi.cnbctv18.com/market-gurukul',
 'code_flare\n165 posts\n216 followers\n10 following\nThe Journey Notebook | Travel | Food Blogger\nBlogger\nüìç Exploring places & phases of life üåç\n.\nüçµ From chai breaks to flight takes ‚úàÔ∏è\n.\nüí´ Travel | Food | Daily life üå∏\nyoutube.com/@thejourneynotebook?si=Z1lPUAl-cPgHAP_a',
 'bangalore_tech_bro\n402 posts\n12.5K followers\n890 following\nRahul | HSR Hustler\nEntrepreneur\nüöÄ Building the next Unicorn in Fintech\n‚òï 3rd Wave Coffee addict\nüíª Python | React | AI\nüìç HSR Layout, BLR\nlinktr.ee/rahulbuilds',
 'silkboard_survivor\n55 posts\n1,2

In [4]:
def convert_number(text):
    """Convert numbers with K/M suffix to integers."""
    text = text.replace(",", "")
    if "K" in text:
        return int(float(text.replace("K", "")) * 1_000)
    if "M" in text:
        return int(float(text.replace("M", "")) * 1_000_000)
    return int(text)


In [5]:
# Step 3: Robust chunk parser
def parse_chunk(chunk):
    """Parse a chunk of Instagram-like profile data into a dictionary."""
    lines = chunk.strip().split("\n")

    if len(lines) < 7:
        return None
            
    return {
        "username": lines[0],
        "no_of_posts": convert_number(lines[1].split(" post")[0]),
        "no_of_followers": convert_number(lines[2].split(" follower")[0]),
        "no_of_following": convert_number(lines[3].split(" following")[0]),
        "name": lines[4],
        "type_of_page": lines[5] if len(lines) > 5 else "Unknown",
        "bio": "\n".join(lines[6:]) if len(lines) > 6 else ""
    }


In [6]:
# Step 4: Parse all chunks safely
all_data = []

for chunk in chunks:
    parsed = parse_chunk(chunk)
    if parsed is not None:
        all_data.append(parsed)


# Preview result
all_data[2:4]


[{'username': 'code_flare',
  'no_of_posts': 165,
  'no_of_followers': 216,
  'no_of_following': 10,
  'name': 'The Journey Notebook | Travel | Food Blogger',
  'type_of_page': 'Blogger',
  'bio': 'üìç Exploring places & phases of life üåç\n.\nüçµ From chai breaks to flight takes ‚úàÔ∏è\n.\nüí´ Travel | Food | Daily life üå∏\nyoutube.com/@thejourneynotebook?si=Z1lPUAl-cPgHAP_a'},
 {'username': 'bangalore_tech_bro',
  'no_of_posts': 402,
  'no_of_followers': 12500,
  'no_of_following': 890,
  'name': 'Rahul | HSR Hustler',
  'type_of_page': 'Entrepreneur',
  'bio': 'üöÄ Building the next Unicorn in Fintech\n‚òï 3rd Wave Coffee addict\nüíª Python | React | AI\nüìç HSR Layout, BLR\nlinktr.ee/rahulbuilds'}]

# Who has the maximum posts?
Lets write the code to find maximum posts 

In [7]:
max_posts_user = max(all_data, key=lambda x: x["no_of_posts"])

print("User with maximum posts:")
print(max_posts_user)

User with maximum posts:
{'username': 'startuphub_blr', 'no_of_posts': 2300, 'no_of_followers': 45000, 'no_of_following': 120, 'name': 'Startup Hub Bangalore', 'type_of_page': 'Media', 'bio': 'ü¶Ñ News from the Silicon Valley of India\nüì¢ Funding alerts, Hiring trends, and Drama\nüì© DM for features\nstartuphub.blr/newsletter'}


# Who has the maximum followers?
Lets write the code to find maximum followers

In [8]:
max_followers_user = max(all_data, key=lambda x: x["no_of_followers"])

print("User with maximum followers:")
print(max_followers_user)


User with maximum followers:
{'username': '_anujsinghal', 'no_of_posts': 1785, 'no_of_followers': 681000, 'no_of_following': 248, 'name': 'Anuj Singhal', 'type_of_page': 'Digital creator', 'bio': 'Managing Editor, CNBC-Awaaz. Most trusted and followed biz anchor in India\nLet‚Äôs talk about finance and life\nContact: Singhalanuj1010@gmail.com\nhindi.cnbctv18.com/market-gurukul'}


# Who has the maximum following?
Lets write the code to find maximum following

In [9]:
max_followers_user = max(all_data, key=lambda x: x["no_of_following"])

print("User with maximum following:")
print(max_followers_user)

User with maximum following:
{'username': 'katiasales2220', 'no_of_posts': 107, 'no_of_followers': 219, 'no_of_following': 1401, 'name': 'Katia Sales', 'type_of_page': 'Alfabetiza√ß√£o e Refor√ßo Escolar', 'bio': 'Ensino Fundamental I e II\nPrepara√ß√£o para Escola Militar\nPortugu√™s e Matem√°tica\nCursando Psicopedagogia Est√°cio'}


# How many categories?
Lets write the code to find out number of categories

In [10]:
categories = set()
for chunk in all_data:
    categories.add(chunk['type_of_page'])
print(categories, len(categories))

{'Personal Blog', 'Blogger', 'Food & Drink', 'Media', 'Community', 'Entrepreneur', 'Nonprofit organization', 'Public Figure', 'Alfabetiza√ß√£o e Refor√ßo Escolar', 'Digital creator', 'Investor', 'Gamer'} 12


In [11]:
import pandas as pd

# Convert your list of dictionaries into a structured table
df = pd.DataFrame(all_data)

# Export to CSV (index=False prevents an extra column of numbers)
df.to_csv('finaldata.csv', index=False)

print(f"Success! Saved {len(df)} profiles to finaldata.csv")

Success! Saved 12 profiles to finaldata.csv
