The purpose of this document is to fine-tune pandas operations for use in the MCP server.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('labeled.csv', index_col=0)

In [3]:
sample = df[df['City'] == 'Richardson']
sample

Unnamed: 0,City,State,Population,Percent employed,Occupation (MBSA),Occupation (S),Occupation (SO),Occupation (RCM),Occupation (PT),Median household income,Homeownership rate,Median home price,Median rent,KMeans,Hierarchical
1353,Richardson,Texas,118269.0,70.8,56.8,11.5,20.2,4.2,7.2,96257.0,50.8,405600.0,1825.0,7,1


In [13]:
sample = sample.to_dict('records')
sample

[{'City': 'Richardson',
  'State': 'Texas',
  'Population': 118269.0,
  'Percent employed': 70.8,
  'Occupation (MBSA)': 56.8,
  'Occupation (S)': 11.5,
  'Occupation (SO)': 20.2,
  'Occupation (RCM)': 4.2,
  'Occupation (PT)': 7.2,
  'Median household income': 96257.0,
  'Homeownership rate': 50.8,
  'Median home price': 405600.0,
  'Median rent': 1825.0,
  'KMeans': 7,
  'Hierarchical': 1}]

In [5]:
categories = ['Population', 'Percent employed', 'Occupation (MBSA)', 'Occupation (S)', 'Occupation (SO)', 
              'Occupation (RCM)', 'Occupation (PT)', 'Median household income', 'Homeownership rate',
              'Median home price', 'Median rent']
population = df[categories].agg('median')
population

Population                   2449.0
Percent employed               62.4
Occupation (MBSA)              32.3
Occupation (S)                 16.8
Occupation (SO)                19.7
Occupation (RCM)               12.2
Occupation (PT)                13.7
Median household income     66176.0
Homeownership rate             72.4
Median home price          211500.0
Median rent                  1049.0
dtype: float64

I need to manipulate this pandas series to mirror the format of the sample dataframe.

In [10]:
population = pd.DataFrame(population).T.to_dict('records')
population

[{'Population': 2449.0,
  'Percent employed': 62.4,
  'Occupation (MBSA)': 32.3,
  'Occupation (S)': 16.8,
  'Occupation (SO)': 19.7,
  'Occupation (RCM)': 12.2,
  'Occupation (PT)': 13.7,
  'Median household income': 66176.0,
  'Homeownership rate': 72.4,
  'Median home price': 211500.0,
  'Median rent': 1049.0}]

Cool. Now I need to bundle this up into a list of two dictionaries, one for the specific citgy and the other for the region. THese two records will be returned to the LLM, which will be responsible for making the comparisons.

In [16]:
final_list = [sample[0], population[0]]
final_list

[{'City': 'Richardson',
  'State': 'Texas',
  'Population': 118269.0,
  'Percent employed': 70.8,
  'Occupation (MBSA)': 56.8,
  'Occupation (S)': 11.5,
  'Occupation (SO)': 20.2,
  'Occupation (RCM)': 4.2,
  'Occupation (PT)': 7.2,
  'Median household income': 96257.0,
  'Homeownership rate': 50.8,
  'Median home price': 405600.0,
  'Median rent': 1825.0,
  'KMeans': 7,
  'Hierarchical': 1},
 {'Population': 2449.0,
  'Percent employed': 62.4,
  'Occupation (MBSA)': 32.3,
  'Occupation (S)': 16.8,
  'Occupation (SO)': 19.7,
  'Occupation (RCM)': 12.2,
  'Occupation (PT)': 13.7,
  'Median household income': 66176.0,
  'Homeownership rate': 72.4,
  'Median home price': 211500.0,
  'Median rent': 1049.0}]