# Data and Code Management

### JSON Examples

In [1]:
import json
data = '''{
  "name" : "Chuck",
  "phone" : {
    "type" : "intl",
    "number" : "+1 734 303 4456"
   },
   "email" : {
     "hide" : "yes"
   }
}'''

info = json.loads(data)
print(type(data),type(info))

print('Name:',info["name"])
print('Hide:',info["email"]["hide"])

<class 'str'> <class 'dict'>
Name: Chuck
Hide: yes


In [2]:
import json
input = '''[
  { "id" : "001",
    "x" : "2",
    "name" : "Chuck"
  } ,
  { "id" : "009",
    "x" : "7",
    "name" : "Chuck"
  }
]'''

info = json.loads(input)
print('User count:', len(info))
for item in info:
    print('Name', item['name'])
    print('Id', item['id'])
    print('Attribute', item['x'])

User count: 2
Name Chuck
Id 001
Attribute 2
Name Chuck
Id 009
Attribute 7


We can also move the json into a pandas DataFrame

In [3]:
import pandas as pd
df = pd.DataFrame(info)
df

Unnamed: 0,id,x,name
0,1,2,Chuck
1,9,7,Chuck


In [4]:
### When the schema is different
input = '''[
  { "id" : "001",
    "x" : "2",
    "name" : "Chuck"
  } ,
  { "id" : "009",
    "x" : "7",
    "name" : "Chuck",
    "other" : "A value"
  }
]'''

info = json.loads(input)

df = pd.DataFrame(info)
df

Unnamed: 0,id,x,name,other
0,1,2,Chuck,
1,9,7,Chuck,A value


## Flask API Tutorial
This tutorial will guide you through building a simple Flask API in Python. 

Flask library (install using pip install Flask)

Learning Objectives:

- Create a Flask application
- Define routes for API endpoints
- Handle different HTTP methods (GET, POST)
- Return JSON data
- Implement basic error handling

Flask will not work in JupyterHub.  Students need to install Anaconda, PyCharm, VSCode or use Google Colab.
You can obtain Anaconda [here](https://www.anaconda.com/download).

In [None]:
#pip install flask

This cell imports the Flask class and creates an instance of a Flask application named app. The __name__ variable refers to the current Python module. Setting debug=True enables Flask's debugger, which helps identify errors during development.

It then defines a route for the root URL (/) using the @app.route decorator. The function hello_world will be executed when a GET request is made to this route. The function returns a string message, which will be the response sent back to the client.



In [5]:
from flask import Flask

app = Flask("First app")

In [6]:

@app.route("/")
def hello_world():
    return "<p>Hello, World!</p>"


In [7]:
# Run the Flask development server (optional for this cell)
app.run(host='localhost', port=5005)

 * Serving Flask app 'First app'
 * Debug mode: off


 * Running on http://localhost:5005
Press CTRL+C to quit


This cell demonstrates how to create routes that accept variables. The <name> part in the URL pattern (/hello/<name>) is a placeholder that will be replaced with the actual value provided in the URL. The hello_name function receives this value as an argument named name. We use f-strings to dynamically create a greeting message that includes the provided name.

In [9]:
from flask import Flask

app = Flask("Second app")

@app.route('/hello/<name>')
def hello_name(name):
  # Use the `name` variable from the URL
  return f"Hello, {name}!"


In [10]:
app.run(host='localhost', port=5006)
#go to http://localhost:5006/hello/John

 * Serving Flask app 'Second app'
 * Debug mode: off


 * Running on http://localhost:5006
Press CTRL+C to quit
127.0.0.1 - - [08/Jun/2025 17:55:15] "GET /Drkae HTTP/1.1" 404 -
127.0.0.1 - - [08/Jun/2025 17:55:15] "GET /favicon.ico HTTP/1.1" 404 -
127.0.0.1 - - [08/Jun/2025 17:55:22] "GET /Drake/ HTTP/1.1" 404 -
127.0.0.1 - - [08/Jun/2025 17:55:31] "GET / HTTP/1.1" 404 -
127.0.0.1 - - [08/Jun/2025 17:55:34] "GET /hello/Drake HTTP/1.1" 200 -


This cell introduces handling different HTTP methods (GET and POST). When the user visits /form with a GET request, the first section of the form function is executed. Here, we return an HTML form that allows the user to enter their name in a text input field and submit it.

When the user submits the form, a POST request is sent to the same URL (/form). 

In [49]:
from flask import Flask, request


app = Flask("Third app")

@app.route('/form', methods=['GET', 'POST'])
def form():
    if request.method == 'GET':
        # Display a form for the user to input their name
        return '''<form method="POST">
                    <label for="username">Enter your name:</label>
                    <input type="text" name="username" id="username" placeholder="Enter your name">
                    <button type="submit">Submit</button>
                  </form>'''
    else:
        # Process the submitted form data
        try:
            username = request.form['username']
            return f"Hello, {username}!"
        except KeyError:
            # Handle the case where the 'username' key is not present in the form data
            return "Please enter your name in the form."


In [50]:
app.run(host='localhost', port=5006)
# http://localhost:5006/form

 * Serving Flask app 'Third app'
 * Debug mode: off


 * Running on http://localhost:5006
Press CTRL+C to quit
127.0.0.1 - - [12/Jun/2025 00:33:50] "GET / HTTP/1.1" 404 -
127.0.0.1 - - [12/Jun/2025 00:34:02] "GET /form HTTP/1.1" 200 -
127.0.0.1 - - [12/Jun/2025 00:34:12] "POST /form HTTP/1.1" 200 -


# Assignment 

You were introduced to the Python data types, structures, frames, formats, such as JSON and CSV, and interface. You have also been introduced to the Python Flask package for building web pages. The purpose of this assignment is to reinforce and assess your learning of the mentioned Python features through practice.  

## Question 1: Read the crunchbase_odm_orgs.csv CSV file into a Pandas data frame and drop records where company names are None values:
- Print the count of records (rows) in the resulting data frame
- Print the first 5 records of the data frame


In [1]:
import pandas as pd

crunchbase = pd.read_csv("crunchbase_odm_orgs.csv")
# crunchbase.info()
# info indicates that each row contains a company name
df1 = pd.DataFrame(crunchbase)
print('There are', len(df1), 'rows in the crunchbase CSV.')
print('First five records:')
df1.head(5)

There are 9999 rows in the crunchbase CSV.
First five records:


Unnamed: 0,uuid,name,type,primary_role,cb_url,domain,homepage_url,logo_url,facebook_url,twitter_url,linkedin_url,combined_stock_symbols,city,region,country_code,short_description
0,e1393508-30ea-8a36-3f96-dd3226033abd,Wetpaint,organization,company,https://www.crunchbase.com/organization/wetpai...,wetpaint.com,http://www.wetpaint.com/,https://res.cloudinary.com/crunchbase-producti...,https://www.facebook.com/Wetpaint,https://twitter.com/wetpainttv,https://www.linkedin.com/company/wetpaint,,New York,New York,USA,Wetpaint offers an online social publishing pl...
1,bf4d7b0e-b34d-2fd8-d292-6049c4f7efc7,Zoho,organization,company,https://www.crunchbase.com/organization/zoho?u...,zoho.com,https://www.zoho.com/,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/zoho,http://twitter.com/zoho,http://www.linkedin.com/company/zoho-corporati...,,Pleasanton,California,USA,"Zoho offers a suite of business, collaboration..."
2,5f2b40b8-d1b3-d323-d81a-b7a8e89553d0,Digg,organization,company,https://www.crunchbase.com/organization/digg?u...,digg.com,http://www.digg.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/digg,http://twitter.com/digg,http://www.linkedin.com/company/digg,,New York,New York,USA,Digg Inc. operates a website that enables its ...
3,df662812-7f97-0b43-9d3e-12f64f504fbb,Facebook,organization,company,https://www.crunchbase.com/organization/facebo...,facebook.com,http://www.facebook.com,https://res.cloudinary.com/crunchbase-producti...,https://www.facebook.com/facebook/,https://twitter.com/facebook,http://www.linkedin.com/company/facebook,nasdaq:FB,Menlo Park,California,USA,Facebook is an online social networking servic...
4,b08efc27-da40-505a-6f9d-c9e14247bf36,Accel,organization,investor,https://www.crunchbase.com/organization/accel?...,accel.com,http://www.accel.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/accel,http://twitter.com/accel,https://www.linkedin.com/company/accel-vc/,,Palo Alto,California,USA,Accel is an early and growth-stage venture cap...


## Question 2: Create a data frame that contains only the records of USA-based companies whose name starts with "Ac":
- Print the count of records (rows) in the resulting data frame
- Print the first 5 records of the data frame


In [2]:
crunchbase_usa = df1[(df1['country_code']=='USA') & (df1['name'].str.startswith('Ac'))]

print('There are', len(crunchbase_usa), 'USA-based companies in the crunchbase CSV whose name start with Ac.')
crunchbase_usa#.head(5)

There are 31 USA-based companies in the crunchbase CSV whose name start with Ac.


Unnamed: 0,uuid,name,type,primary_role,cb_url,domain,homepage_url,logo_url,facebook_url,twitter_url,linkedin_url,combined_stock_symbols,city,region,country_code,short_description
4,b08efc27-da40-505a-6f9d-c9e14247bf36,Accel,organization,investor,https://www.crunchbase.com/organization/accel?...,accel.com,http://www.accel.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/accel,http://twitter.com/accel,https://www.linkedin.com/company/accel-vc/,,Palo Alto,California,USA,Accel is an early and growth-stage venture cap...
186,ed54b2d5-f2f1-d4e9-9bbe-40961ff08d44,Action Engine,organization,company,https://www.crunchbase.com/organization/action...,actionengine.com,http://www.actionengine.com,https://res.cloudinary.com/crunchbase-producti...,,,https://www.linkedin.com/company/actionengine,,Foster City,California,USA,Action Engine is an agile software development...
278,64a76f34-4e68-2eb6-d943-9465f39155cd,ActiveWorlds,organization,company,https://www.crunchbase.com/organization/active...,activeworlds.com,http://www.activeworlds.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/activeworlds3d,http://twitter.com/activeworlds3d,,,Las Vegas,Nevada,USA,Active Worlds is a 3D virtual reality platform...
1244,ef598b5b-d588-c21f-9b93-be78443f4e22,Acquia,organization,company,https://www.crunchbase.com/organization/acquia...,acquia.com,http://acquia.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/acquia,http://twitter.com/Acquia,http://www.linkedin.com/company/167056,,Boston,Massachusetts,USA,Acquia specializes in providing cloud-based di...
1879,135f427d-7f25-9492-1ffe-801b35b6b988,Academic Capital Exchange,organization,company,https://www.crunchbase.com/organization/academ...,academicapital.com,http://www.academicapital.com,https://res.cloudinary.com/crunchbase-producti...,,,,,Chicago,Illinois,USA,"Academic Capital Exchange (ACE), a peer-to-pee..."
2203,8b514862-2515-9a9a-768c-7f144a790863,AccountNow,organization,company,https://www.crunchbase.com/organization/accoun...,accountnow.com,http://accountnow.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/accountnow,http://twitter.com/AccountNow,http://www.linkedin.com/company/accountnow,,San Ramon,California,USA,AccountNow is a financial services company off...
2223,64042128-fd87-8a30-228e-968c973027a0,Acendi Interactive,organization,company,https://www.crunchbase.com/organization/acendi...,acendi.com,http://www.acendi.com,https://res.cloudinary.com/crunchbase-producti...,,,,,San Francisco,California,USA,Acendi Interactive provides software-as-servic...
2437,9f9f4b5a-3634-1bbc-9f04-b941a7916edf,Act-On Software,organization,company,https://www.crunchbase.com/organization/act-on...,act-on.com,http://www.act-on.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/actonsoftware,http://twitter.com/ActOnSoftware,http://www.linkedin.com/company/375185,,Portland,Oregon,USA,Act-On Software’s cloud-based integrated marke...
3233,18b8cafd-f58f-515d-f888-5fd682f12317,Active.com,organization,company,https://www.crunchbase.com/organization/active...,active.com,http://www.active.com,https://res.cloudinary.com/crunchbase-producti...,http://www.facebook.com/Activecom,http://twitter.com/active,,,San Diego,California,USA,Active.com is an online community for people w...
3466,443b284c-9939-cbaa-e4e0-961b6819ab69,Activision Blizzard,organization,company,https://www.crunchbase.com/organization/activi...,activisionblizzard.com,https://activisionblizzard.com,https://res.cloudinary.com/crunchbase-producti...,https://www.facebook.com/activision.blizzard.a...,https://twitter.com/atvi_ab,https://www.linkedin.com/company/activision-bl...,nasdaq:ATVI,Santa Monica,California,USA,Activision Blizzard is an entertainment compan...


## Question 3: Convert the data frame from the previous step into a list of JSON objects:
- Print the count of JSON objects
- Print the first 5 JSON objects
- Write all JSON objects into a text (JSON string) file
- Print the number of records in the resulting file


In [3]:
import json

json_obj = crunchbase_usa.to_dict(orient='records') 
print("Count of JSON objects:", len(json_obj))

print("First 5 JSON objects:") 
for i in range(0, 5):
    print(json_obj[i])

with open('output.json', 'r') as file: 
    json_data = json.load(file) 

print("\nNumber of records in the resulting file:", len(json_data))

Count of JSON objects: 31
First 5 JSON objects:
{'uuid': 'b08efc27-da40-505a-6f9d-c9e14247bf36', 'name': 'Accel', 'type': 'organization', 'primary_role': 'investor', 'cb_url': 'https://www.crunchbase.com/organization/accel?utm_source=crunchbase&utm_medium=export&utm_campaign=odm_csv', 'domain': 'accel.com', 'homepage_url': 'http://www.accel.com', 'logo_url': 'https://res.cloudinary.com/crunchbase-production/image/upload/kxcwecxf439wsgluv7jv', 'facebook_url': 'http://www.facebook.com/accel', 'twitter_url': 'http://twitter.com/accel', 'linkedin_url': 'https://www.linkedin.com/company/accel-vc/', 'combined_stock_symbols': nan, 'city': 'Palo Alto', 'region': 'California', 'country_code': 'USA', 'short_description': 'Accel is an early and growth-stage venture capital firm that powers a global community of entrepreneurs.'}
{'uuid': 'ed54b2d5-f2f1-d4e9-9bbe-40961ff08d44', 'name': 'Action Engine', 'type': 'organization', 'primary_role': 'company', 'cb_url': 'https://www.crunchbase.com/organiza

## Question 4: Read the JSON objects from the created file back into a data frame, filter for companies based in New York (city):
- Print the records in the resulting data frame
- Write the output to a webpage using Flask

In [4]:
from flask import Flask, request

df = pd.DataFrame(json_obj) 
companies_ny = df[df['city'] == 'New York']

print('NY based companies:')
companies_ny

NY based companies:


Unnamed: 0,uuid,name,type,primary_role,cb_url,domain,homepage_url,logo_url,facebook_url,twitter_url,linkedin_url,combined_stock_symbols,city,region,country_code,short_description
23,41d88d51-b45f-83c8-341a-957027e036f7,ActiveCause,organization,company,https://www.crunchbase.com/organization/active...,activecause.com,http://activecause.com,https://res.cloudinary.com/crunchbase-producti...,,http://twitter.com/activecause,https://www.linkedin.com/in/hankejh,,New York,New York,USA,"ActiveCause brings together nonprofits, corpor..."


## Question 5: Implement Flask API functions to Return a list of Crunchbase companies based in a user submitted City from the original dataset (4 Points)
- Take a user input through a form to return a list of JSON objects from the city queried

In [5]:
from flask import Flask, request


app = Flask("Company Locator")

@app.route('/form', methods=['GET', 'POST'])
def form():
    if request.method == 'GET':
        # Display a form for the user to input their name
        return '''<form method="POST">
                    <label for="city_name">Enter the name of a city to locate companies:</label>
                    <input type="str" name="city_name" id="city_name" placeholder="City Name">
                    <button type="submit">Submit</button>
                  </form>'''
    else:
        city_name = request.form['city_name']
        df2 = df1[(df1['city']== city_name)]
        json_obj2 = df2.to_dict(orient='records')
        return json_obj2

In [6]:
app.run(host='localhost', port=5006)
# http://localhost:5006/form

 * Serving Flask app 'Company Locator'
 * Debug mode: off


 * Running on http://localhost:5006
Press CTRL+C to quit
127.0.0.1 - - [12/Jun/2025 13:23:21] "GET /form HTTP/1.1" 200 -
127.0.0.1 - - [12/Jun/2025 13:23:25] "POST /form HTTP/1.1" 200 -


## Question 6: Git Version Control

- Initialize a Git repository for your project directory.
- Commit your code (data.py, Flask application script) with a descriptive message (e.g., "Initial commit: Flask API with data and route").
- Push your code to a remote Git repository (e.g., GitHub). Submit the repository URL in the answer cell.


In [None]:
#Write git repo link

# Assignment End