# Overview
Gathers detailed information about users, their company, and their data assets. The tool will use this input to calculate data valuations and scores based on patterns established from prior valuations. Includes detailed outputs summarizing the valuation and benchmarking the data's quality and potential by providing a percentile where the company lands compared to others.

### sample input
```json
{
  "user_info": {
    "full_name": "John Doe",
    "title": "Data Analyst",
    "email": "johndoe@example.com",
  },
  "company_info": {
    "company_name": "DataTech Inc.",
    "year_founded": "2015",
    "business_type": "B2B",
    "operating_locations": ["United States", "Canada", "Germany"],
    "main_industry": "Technology",
    "website": "https://www.datatech.com",
    "company_description": "Innovative data solutions for businesses.",
    "annual_revenue": "$50M-$100M",
    "employee_count": "201-500",
    "notable_customers": "BigTech Corp, FinancePro Ltd."
  },
  "data_info": {
    "year_began_collecting_data": "2018",
    "has_end_users": true,
    "end_user_count": 50000,
    "total_records": "10M-50M",
    "attributes_collected": "101-500",
    "collects_pii_or_phi": true,
    "cloud_providers": ["AWS", "Azure"],
    "monthly_cloud_bill": "$10K-$50K",
    "data_types_collected": ["Customer Data", "Behavioral Data", "Transaction Data"],
    "primary_data_usage": ["Marketing Analytics", "Customer Retention", "Product Development"],
    "data_valuation_update_frequency": "Quarterly"
  },
  "data_quality": {
    "duplication": 4,
    "depth": 5,
    "security": 3,
    "completeness": 4
  }
}
```

---

### imports

In [1]:
import os
import pandas as pd
import json

In [2]:
os.environ['AWS_PROFILE'] = 'gd-api-prod'

In [3]:
from api import get_all_valuations

---

### helper functions

### Read all valuations

In [4]:
valuations = get_all_valuations()
print(f'length of valuation: {valuations}')
valuations[0]

length of valuation: [{'id': 6041, 'org_id': 4109, 'add_time': '2024-05-09T17:00:57.376Z', 'valuation': {'Cost': {'value': 899030, 'data_costs': 500000}, 'range': '$899,030.00 - $1,032,062.63', 'Market': {'OLS': 1017978.01, 'Lasso': 1012295.11, 'Ridge': 1016993.7, 'build': 'build:202404v2', 'GBoost': 1032062.63, 'best_model': 'GBoost', 'test_score': 0.9785, 'train_size': 7169, 'valid_data': True, 'broker_sale': 1548093.94, 'train_score': 0.9554, 'industry_set': 2301, 'private_sale': 1857712.73}, 'Ransom': 0, 'Royalty': 959557.28}, 'data_results': {'run_type': 'daily', 'iteration': 1, 'users_key': 'clienttype', 'record_pct': '45.42%', 'data_period': 14, 'file_source': 'https://reachoutstorage.blob.core.windows.net/gulpdata-storage?sp=rl&st=2023-08-25T18:46:44Z&se=2024-12-02T02:46:44Z&spr=https&sv=2022-11-02&sr=c&sig=PV03NLeNjwj7qVMh8vzfyChRxOanGsc2hSWXS9Ndfqs%3D', 'geographies': 'United States', 'records_key': 'scriptlogid', 'summary_url': 'https://docs.google.com/spreadsheets/d/1rJweRC

{'id': 6041,
 'org_id': 4109,
 'add_time': '2024-05-09T17:00:57.376Z',
 'valuation': {'Cost': {'value': 899030, 'data_costs': 500000},
  'range': '$899,030.00 - $1,032,062.63',
  'Market': {'OLS': 1017978.01,
   'Lasso': 1012295.11,
   'Ridge': 1016993.7,
   'build': 'build:202404v2',
   'GBoost': 1032062.63,
   'best_model': 'GBoost',
   'test_score': 0.9785,
   'train_size': 7169,
   'valid_data': True,
   'broker_sale': 1548093.94,
   'train_score': 0.9554,
   'industry_set': 2301,
   'private_sale': 1857712.73},
  'Ransom': 0,
  'Royalty': 959557.28},
 'data_results': {'run_type': 'daily',
  'iteration': 1,
  'users_key': 'clienttype',
  'record_pct': '45.42%',
  'data_period': 14,
  'file_source': 'https://reachoutstorage.blob.core.windows.net/gulpdata-storage?sp=rl&st=2023-08-25T18:46:44Z&se=2024-12-02T02:46:44Z&spr=https&sv=2022-11-02&sr=c&sig=PV03NLeNjwj7qVMh8vzfyChRxOanGsc2hSWXS9Ndfqs%3D',
  'geographies': 'United States',
  'records_key': 'scriptlogid',
  'summary_url': 'http

In [5]:
valuation_history = pd.DataFrame(valuations)
valuation_history.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4139 entries, 0 to 4138
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   id              4139 non-null   int64 
 1   org_id          4139 non-null   int64 
 2   add_time        4139 non-null   object
 3   valuation       4139 non-null   object
 4   data_results    4139 non-null   object
 5   scarcity_index  2765 non-null   object
dtypes: int64(2), object(4)
memory usage: 194.1+ KB


In [6]:
valuation_history

Unnamed: 0,id,org_id,add_time,valuation,data_results,scarcity_index
0,6041,4109,2024-05-09T17:00:57.376Z,"{'Cost': {'value': 899030, 'data_costs': 50000...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-119662
1,6221,7220,2024-05-21T17:42:54.138Z,"{'Cost': {'value': 0, 'data_costs': 0}, 'range...","{'run_type': 'vaulting', 'iteration': 1, 'user...",-10580
2,6476,1329,2024-06-05T12:04:57.577Z,"{'Cost': {'value': 453111.12, 'data_costs': 25...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-147946
3,7629,6701,2024-10-03T12:39:59.803Z,"{'Cost': {'value': 0, 'data_costs': 0}, 'range...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-606452
4,6479,4234,2024-06-05T12:07:07.993Z,"{'Cost': {'value': 1798060, 'data_costs': 1000...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-330723
...,...,...,...,...,...,...
4134,8303,438,2024-12-17T04:10:10.701Z,"{'Cost': {'value': 116873.9, 'data_costs': 650...","{'run_type': 'gulpdata', 'iteration': 4, 'user...",-1651911
4135,8304,1329,2024-12-17T12:05:33.058Z,"{'Cost': {'value': 453111.12, 'data_costs': 25...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-703030
4136,8305,1331,2024-12-17T12:05:48.133Z,"{'Cost': {'value': 215767.2, 'data_costs': 120...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-1126082
4137,8306,4234,2024-12-17T12:08:05.691Z,"{'Cost': {'value': 1798060, 'data_costs': 1000...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-1756264


## Expand Dataframe

In [7]:
df = valuation_history.copy()
df.head()

Unnamed: 0,id,org_id,add_time,valuation,data_results,scarcity_index
0,6041,4109,2024-05-09T17:00:57.376Z,"{'Cost': {'value': 899030, 'data_costs': 50000...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-119662
1,6221,7220,2024-05-21T17:42:54.138Z,"{'Cost': {'value': 0, 'data_costs': 0}, 'range...","{'run_type': 'vaulting', 'iteration': 1, 'user...",-10580
2,6476,1329,2024-06-05T12:04:57.577Z,"{'Cost': {'value': 453111.12, 'data_costs': 25...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-147946
3,7629,6701,2024-10-03T12:39:59.803Z,"{'Cost': {'value': 0, 'data_costs': 0}, 'range...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-606452
4,6479,4234,2024-06-05T12:07:07.993Z,"{'Cost': {'value': 1798060, 'data_costs': 1000...","{'run_type': 'daily', 'iteration': 1, 'users_k...",-330723


In [8]:
valuation_expanded = pd.json_normalize(df['valuation'].apply(lambda x: {k: v for k, v in x.items() if k not in ['Market', 'Cost']}))
valuation_expanded.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4139 entries, 0 to 4138
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   range    4139 non-null   object 
 1   Ransom   4139 non-null   int64  
 2   Royalty  4139 non-null   float64
dtypes: float64(1), int64(1), object(1)
memory usage: 97.1+ KB


In [9]:
valuation_cost_expanded = pd.json_normalize(df['valuation'].apply(lambda x: x.get('Cost', {})))
valuation_cost_expanded.head()


Unnamed: 0,value,data_costs
0,899030.0,500000
1,0.0,0
2,453111.12,252000
3,0.0,0
4,1798060.0,1000000


In [10]:
valuation_market_expanded = pd.json_normalize(df['valuation'].apply(lambda x: x.get('Market', {})))
valuation_market_expanded.head()

Unnamed: 0,OLS,Lasso,Ridge,build,GBoost,best_model,test_score,train_size,valid_data,broker_sale,train_score,industry_set,private_sale,MLP,NLPxLasso,NLPxGBoost,Polynomial
0,1017978.01,1012295.11,1016993.7,build:202404v2,1032062.63,GBoost,0.9785,7169,True,1548093.94,0.9554,2301.0,1857712.73,,,,
1,1333453.52,1332653.83,1334218.28,build:202405v1,1445277.32,GBoost,0.9785,7169,True,2167915.98,0.9554,665.0,2601499.18,,,,
2,666151.09,662934.87,666308.59,build:202405v2,814183.75,GBoost,0.9726,9462,True,1221275.62,0.9596,829.0,1465530.75,469335.52,187179.13,197213.62,
3,993019.19,808758.52,992773.47,build:202409v1,982989.2,GBoost,0.9703,5754,True,1474483.8,0.9909,673.0,1769380.56,965158.15,273022.35,288382.02,
4,982102.89,958324.43,979222.91,build:202405v2,1088490.63,GBoost,0.9726,9462,True,1632735.94,0.9596,1079.0,1959283.13,3368800.88,253262.25,284735.11,


In [11]:
# Step 3: Normalize 'data_results' dictionary
data_results_expanded = pd.json_normalize(df['data_results'])
data_results_expanded.head()

Unnamed: 0,run_type,iteration,users_key,record_pct,data_period,file_source,geographies,records_key,summary_url,company_name,...,records_per_user,primary_geography,total_num_records,records_similarity,total_unique_users,essential_attributes,total_num_attributes,rate_of_record,vaulting,data_ end_date
0,daily,1.0,clienttype,45.42%,14.0,https://reachoutstorage.blob.core.windows.net/...,United States,scriptlogid,https://docs.google.com/spreadsheets/d/1rJweRC...,ReachOut Technology Corp.,...,48286153.33,USA,144858460,93.15%,3.0,327.0,491,,,
1,vaulting,1.0,,100.00%,13.0,manual-run,"United States,South America,Other",data_dt,https://docs.google.com/spreadsheets/d/1PjcHyy...,Auctane,...,0.0,Global,8776701062,0.00%,0.0,8.0,8,,,
2,daily,1.0,email,80.52%,54.0,s3://produce-gulpdata-files/mongodb-backups/,United States,id,https://docs.google.com/spreadsheets/d/1crXMYH...,PRoduce LLC,...,42.35,USA,3601292,84.66%,85030.0,304.0,503,,,
3,daily,1.0,email,44.42%,54.0,s3://dopple-gulp-data-vaulting,United States,id,https://docs.google.com/spreadsheets/d/1hp6HxE...,"Dopple, LLC",...,23.43,USA,8792660,78.84%,375198.0,445.0,538,,,
4,daily,1.0,userloginid,68.83%,54.0,s3://moocho-datalake/prod/,United States,userloginid,https://docs.google.com/spreadsheets/d/1dfBSM3...,"Moocho, Inc.",...,1.0,USA,168618098,89.11%,168618098.0,1208.0,1405,,,


In [12]:
df = pd.concat([df.drop(columns=['valuation', 'data_results']), valuation_expanded, valuation_cost_expanded, valuation_market_expanded, data_results_expanded], axis=1)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4139 entries, 0 to 4138
Data columns (total 59 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    4139 non-null   int64  
 1   org_id                4139 non-null   int64  
 2   add_time              4139 non-null   object 
 3   scarcity_index        2765 non-null   object 
 4   range                 4139 non-null   object 
 5   Ransom                4139 non-null   int64  
 6   Royalty               4139 non-null   float64
 7   value                 4139 non-null   float64
 8   data_costs            4139 non-null   int64  
 9   OLS                   4139 non-null   float64
 10  Lasso                 4139 non-null   float64
 11  Ridge                 4139 non-null   float64
 12  build                 3678 non-null   object 
 13  GBoost                4139 non-null   float64
 14  best_model            4089 non-null   object 
 15  test_score           

In [13]:
df.columns

Index(['id', 'org_id', 'add_time', 'scarcity_index', 'range', 'Ransom',
       'Royalty', 'value', 'data_costs', 'OLS', 'Lasso', 'Ridge', 'build',
       'GBoost', 'best_model', 'test_score', 'train_size', 'valid_data',
       'broker_sale', 'train_score', 'industry_set', 'private_sale', 'MLP',
       'NLPxLasso', 'NLPxGBoost', 'Polynomial', 'run_type', 'iteration',
       'users_key', 'record_pct', 'data_period', 'file_source', 'geographies',
       'records_key', 'summary_url', 'company_name', 'data_founded',
       'user_yoy_avg', 'business_type', 'data_end_date', 'total_database',
       'data_start_date', 'pct_data_growth', 'record_dup_rate',
       'records_yoy_avg', 'tagged_industry', 'upload_fraction',
       'company_overview', 'primary_industry', 'records_per_user',
       'primary_geography', 'total_num_records', 'records_similarity',
       'total_unique_users', 'essential_attributes', 'total_num_attributes',
       'rate_of_record', 'vaulting', 'data_ end_date'],
      dty

In [14]:
df.head()

Unnamed: 0,id,org_id,add_time,scarcity_index,range,Ransom,Royalty,value,data_costs,OLS,...,records_per_user,primary_geography,total_num_records,records_similarity,total_unique_users,essential_attributes,total_num_attributes,rate_of_record,vaulting,data_ end_date
0,6041,4109,2024-05-09T17:00:57.376Z,-119662,"$899,030.00 - $1,032,062.63",0,959557.3,899030.0,500000,1017978.01,...,48286153.33,USA,144858460,93.15%,3.0,327.0,491,,,
1,6221,7220,2024-05-21T17:42:54.138Z,-10580,"$0.00 - $102,809,708.34",0,102809700.0,0.0,0,1333453.52,...,0.0,Global,8776701062,0.00%,0.0,8.0,8,,,
2,6476,1329,2024-06-05T12:04:57.577Z,-147946,"$219,327.38 - $814,183.75",0,219327.4,453111.12,252000,666151.09,...,42.35,USA,3601292,84.66%,85030.0,304.0,503,,,
3,7629,6701,2024-10-03T12:39:59.803Z,-606452,"$0.00 - $982,989.20",0,23303.53,0.0,0,993019.19,...,23.43,USA,8792660,78.84%,375198.0,445.0,538,,,
4,6479,4234,2024-06-05T12:07:07.993Z,-330723,"$294,584.08 - $1,798,060.00",0,294584.1,1798060.0,1000000,982102.89,...,1.0,USA,168618098,89.11%,168618098.0,1208.0,1405,,,


### sample input

In [15]:
with open('./sample_input.json', 'r') as f:
    data = json.load(f)

In [16]:
data

{'user_info': {'full_name': 'John Doe',
  'title': 'Data Analyst',
  'email': 'johndoe@example.com'},
 'company_info': {'company_name': 'DataTech Inc.',
  'year_founded': '2015',
  'business_type': 'B2B',
  'operating_locations': ['United States', 'Canada', 'Germany'],
  'main_industry': 'Information Technology',
  'website': 'https://www.datatech.com',
  'company_description': 'Innovative data solutions for businesses.',
  'annual_revenue': '$50M-$100M',
  'employee_count': '201-500',
  'notable_customers': 'BigTech Corp, FinancePro Ltd.'},
 'data_info': {'year_began_collecting_data': '2018',
  'has_end_users': True,
  'end_user_count': 50000,
  'total_records': '10K-50K',
  'attributes_collected': '101-500',
  'collects_pii_or_phi': True,
  'cloud_providers': ['AWS', 'Azure'],
  'monthly_cloud_bill': '$10K-$50K',
  'data_types_collected': ['Customer Data',
   'Behavioral Data',
   'Transaction Data'],
  'primary_data_usage': ['Marketing Analytics',
   'Customer Retention',
   'Produc

### relevant columns
```
company_info['business_type'] → business_type
data_info['year_began_collecting_data'] → data_start_date
data_info['total_records'] → total_num_records
data_info['attributes_collected'] → total_num_attributes
data_info['end_user_count'] → total_unique_users
data_quality['duplication'] → record_dup_rate
data_quality['depth'] → essential_attributes
```

#### Normalize input

In [17]:
from decimal import Decimal

def text_to_num(text):
        d = {
        'K': 3,
        'M': 6,
        'B': 9
        }
        if text[-1] in d:
            num, magnitude = text[:-1], text[-1]
            return int(Decimal(num) * 10 ** d[magnitude])
        else:
            return int(Decimal(text))

In [18]:
def clean_range(records):
    total = []
    for rec in records.split('-'):
        total.append(text_to_num(rec))
    return total

In [19]:
from datetime import date

def data_length(year_began_collecting_data):
    current_date = date.today()
    current_year = current_date.year

    data_period = current_year - int(year_began_collecting_data)
    return data_period

In [20]:
# Normalize input

input_data = {
    "total_records": clean_range(data['data_info']['total_records']),
    "end_user_count":  data['data_info']['end_user_count'],
    "record_dup_rate": 1 - (data['data_quality']['duplication'] / 5),  # lower duplication = higher score
    "industry": data['company_info']['main_industry'],
    "data_length": data_length(data['data_info']['year_began_collecting_data']),
    "total_num_attributes": data['data_info']['attributes_collected'].split('-'),
}
input_data

{'total_records': [10000, 50000],
 'end_user_count': 50000,
 'record_dup_rate': 0.19999999999999996,
 'industry': 'Information Technology',
 'data_length': 6,
 'total_num_attributes': ['101', '500']}

In [21]:
valuations_df = df.copy()
valuations_df.head(2)

Unnamed: 0,id,org_id,add_time,scarcity_index,range,Ransom,Royalty,value,data_costs,OLS,...,records_per_user,primary_geography,total_num_records,records_similarity,total_unique_users,essential_attributes,total_num_attributes,rate_of_record,vaulting,data_ end_date
0,6041,4109,2024-05-09T17:00:57.376Z,-119662,"$899,030.00 - $1,032,062.63",0,959557.3,899030.0,500000,1017978.01,...,48286153.33,USA,144858460,93.15%,3.0,327.0,491,,,
1,6221,7220,2024-05-21T17:42:54.138Z,-10580,"$0.00 - $102,809,708.34",0,102809700.0,0.0,0,1333453.52,...,0.0,Global,8776701062,0.00%,0.0,8.0,8,,,


In [22]:
valuations_df['primary_industry'].unique()

array(['Business Services', 'Transportation and Public Utilities',
       'Retail Trade', 'Finance Insurance and Real Estate',
       'Health Services', 'Entertainment and Recreation', 'Manufacturing',
       'Information Technology', None], dtype=object)

In [23]:
# Filter relevant industry data
industry_set = valuations_df[valuations_df['primary_industry'] == input_data['industry']]
industry_set.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 986 to 4106
Data columns (total 59 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    5 non-null      int64  
 1   org_id                5 non-null      int64  
 2   add_time              5 non-null      object 
 3   scarcity_index        5 non-null      object 
 4   range                 5 non-null      object 
 5   Ransom                5 non-null      int64  
 6   Royalty               5 non-null      float64
 7   value                 5 non-null      float64
 8   data_costs            5 non-null      int64  
 9   OLS                   5 non-null      float64
 10  Lasso                 5 non-null      float64
 11  Ridge                 5 non-null      float64
 12  build                 5 non-null      object 
 13  GBoost                5 non-null      float64
 14  best_model            5 non-null      object 
 15  test_score            5 non

In [24]:
industry_set['GBoost']

986      378548.14
1553     917133.35
2091     509874.10
4105    1127459.18
4106     509874.10
Name: GBoost, dtype: float64

In [40]:
industry_set[['org_id', 'total_num_records', 'GBoost', 'range']]

Unnamed: 0,org_id,total_num_records,GBoost,range
986,7214,24000,378548.14,"$0.00 - $378,548.14"
1553,7248,72583428,917133.35,"$0.00 - $917,133.35"
2091,7314,74992,509874.1,"$0.00 - $509,874.10"
4105,7320,221378,1127459.18,"$0.00 - $1,127,459.18"
4106,7314,74992,509874.1,"$0.00 - $509,874.10"


In [46]:
rec_range_set = industry_set[industry_set['total_num_records'].between(
    input_data['total_records'][0], input_data['total_records'][1]
    )]
rec_range_set

Unnamed: 0,id,org_id,add_time,scarcity_index,range,Ransom,Royalty,value,data_costs,OLS,...,records_per_user,primary_geography,total_num_records,records_similarity,total_unique_users,essential_attributes,total_num_attributes,rate_of_record,vaulting,data_ end_date
986,6642,7214,2024-06-24T14:21:47.888Z,-40570,"$0.00 - $378,548.14",0,41123.88,0.0,0,306653.68,...,0.0,Global,24000,53.59%,0.0,13.0,16,,,


In [45]:
# Calculate benchmarks
average_valuation = industry_set['GBoost'].mean()
print('average_valuation: ', average_valuation)
min_valuation = industry_set['GBoost'].min()
print('min_valuation: ', min_valuation)
max_valuation = industry_set['GBoost'].max()
print('max_valuation: ', max_valuation)
top_25 = industry_set['GBoost'].quantile(0.75)
print('top_25: ', top_25)


average_valuation:  688577.774
min_valuation:  378548.14
max_valuation:  1127459.18
top_25:  917133.35


### compute score

In [28]:
input_data

{'total_records': [10000, 50000],
 'end_user_count': 50000,
 'record_dup_rate': 0.19999999999999996,
 'industry': 'Information Technology',
 'data_length': 6,
 'total_num_attributes': ['101', '500']}

In [30]:
input_data['data_length']

6

In [54]:
from scipy.stats import percentileofscore


In [67]:
percentileofscore(df['total_num_records'], input_data['total_records'][0], kind='rank')


np.float64(6.257550132882338)

In [68]:
percentileofscore(df['total_num_records'], input_data['total_records'][1], kind='rank')


np.float64(12.732544092776033)

In [50]:
# Step 4: Calculate percentile rank for record size
percentile_rank = rec_range_set['total_num_records'].mean()  * 100
percentile_rank

np.float64(24000.0)

In [53]:
print(f"Percentile Rank: {percentile_rank:.2f}%")


Percentile Rank: 24000.00%


In [52]:
import numpy as np
# Step 5: Adjust for data quality
quality_score = np.mean([
    data['data_quality']['duplication'],
    data['data_quality']['depth'],
    data['data_quality']['completeness']
]) / 5  # Normalize to 0-1 scale
quality_score

np.float64(0.8666666666666666)

In [63]:

# Step 6: Format output
output = {
    'Valuation': {
        'average_valuation': f"${average_valuation:,.2f}",
        'min': f"${min_valuation:,.2f}",
        'max': f"${max_valuation:,.2f}",
        'top_25': f"${top_25:,.2f}"
    },
    'score': {
        'record_size': f"{input_data['total_records'],}",
        'data_length': f"{input_data['data_length']} years",
        'percentile_rank': f"{percentile_rank:.2f}%",
        'quality_adjusted_score': f"{quality_score * 100:.2f}%"
    }
}


In [64]:
output

{'Valuation': {'average_valuation': '$688,577.77',
  'min': '$378,548.14',
  'max': '$1,127,459.18',
  'top_25': '$917,133.35'},
 'score': {'record_size': '([10000, 50000],)',
  'data_length': '6 years',
  'percentile_rank': '24000.00%',
  'quality_adjusted_score': '86.67%'}}

## output
Background on Gulp: 3,800+ valuations, $3B+ data valued, 3T+ records valued, 150K+ market comps behind the valuations


Background on the method: Our data value calculator leverages insights from previous valuations conducted by Gulp Data, focusing on data assets similar to yours. Unlike our market-comp models, which are based on actual market data and transactions, the data value calculator derives its insights from Gulp Data's prior in-house valuations of assets similar to yours. This method doesn’t rely on real-time market comparisons but instead applies patterns and valuation criteria we've established from assessing comparable data types. For a full data valuation, TEXT FISH to 1-800-iam-fish :)


In [36]:
output = {}

In [37]:
output['Valuation'] = {
    'average_valuation': '',
    'min': '',
    'max': '',
    'top_25': ''
}

In [38]:
output['score'] = {
    'record_size': '',
    'data_length': 'years',
    'percentile_rank': ''
}

In [39]:
output

{'Valuation': {'average_valuation': '', 'min': '', 'max': '', 'top_25': ''},
 'score': {'record_size': '', 'data_length': 'years', 'percentile_rank': ''}}