<a href="https://colab.research.google.com/github/Mathavk1606/Real-Estate-Demand-Prediction/blob/main/Real_Estate_Demand_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Real Estate Demand Prediction**

## **📋 Table of Contents**

Introduction & Problem Statement

Dataset Overview

Exploratory Data Analysis

Feature Engineering

Model Development

Results & Insights

Conclusions

## **🎯 Introduction & Problem Statement**
In China’s fast-evolving and highly dynamic housing market, accurately forecasting residential demand is vital for investment and development decisions. This competition challenges you to develop a machine learning model that predicts each sector's monthly sales for newly launched private residential projects, using historical transaction data, market conditions, and other relevant features.

# Real Estate and City Data Dictionary

## Table of Contents
1. [Pre-owned House Transactions](#pre-owned-house-transactions)
2. [Pre-owned House Transactions (Nearby Sectors)](#pre-owned-house-transactions-nearby-sectors)
3. [Land Transactions](#land-transactions)
4. [Land Transactions (Nearby Sectors)](#land-transactions-nearby-sectors)
5. [New House Transactions](#new-house-transactions)
6. [New House Transactions (Nearby Sectors)](#new-house-transactions-nearby-sectors)
7. [Sector POI (Points of Interest)](#sector-poi-points-of-interest)
8. [City Search Index](#city-search-index)
9. [City Indexes](#city-indexes)

---

## Pre-owned House Transactions
**File:** `train/pre_owned_house_transactions.csv`

| Field Name | Description | Unit |
|------------|-------------|------|
| `month` | The month of the transaction | Date (YYYY-MM) |
| `sector` | The specific geographic sector where the transaction occurred | Text |
| `area_pre_owned_house_transactions` | The total area of pre-owned house transactions | Square meters (m²) |
| `amount_pre_owned_house_transactions` | The total monetary value of pre-owned house transactions | 10,000 yuan |
| `num_pre_owned_house_transactions` | The total number of pre-owned house transactions | Count |
| `price_pre_owned_house_transactions` | The average price per square meter of pre-owned house transactions | Yuan per m² |

---

## Pre-owned House Transactions (Nearby Sectors)
**File:** `train/pre_owned_house_transactions_nearby_sectors.csv`

| Field Name | Description | Unit |
|------------|-------------|------|
| `month` | The month of the transaction | Date (YYYY-MM) |
| `sector` | The specific geographic sector of interest | Text |
| `area_pre_owned_house_transactions_nearby_sectors` | The total area of pre-owned house transactions in nearby sectors | Square meters (m²) |
| `amount_pre_owned_house_transactions_nearby_sectors` | The total monetary value of pre-owned house transactions in nearby sectors | 10,000 yuan |
| `num_pre_owned_house_transactions_nearby_sectors` | The total number of pre-owned house transactions in nearby sectors | Count |
| `price_pre_owned_house_transactions_nearby_sectors` | The average price per square meter of pre-owned house transactions in nearby sectors | Yuan per m² |

---

## Land Transactions
**File:** `train/land_transactions.csv`

| Field Name | Description | Unit |
|------------|-------------|------|
| `month` | The month of the transaction | Date (YYYY-MM) |
| `sector` | The specific geographic sector where the land transaction occurred | Text |
| `num_land_transactions` | The total number of land transactions | Count |
| `construction_area` | The total area of land designated for construction | Square meters (m²) |
| `planned_building_area` | The total planned building area on the transacted land | Square meters (m²) |
| `transaction_amount` | The total monetary value of land transactions | 10,000 yuan |

---

## Land Transactions (Nearby Sectors)
**File:** `train/land_transactions_nearby_sectors.csv`

| Field Name | Description | Unit |
|------------|-------------|------|
| `month` | The month of the transaction | Date (YYYY-MM) |
| `sector` | The specific geographic sector of interest | Text |
| `num_land_transactions_nearby_sectors` | The total number of land transactions in nearby sectors | Count |
| `construction_area_nearby_sectors` | The total area of land designated for construction in nearby sectors | Square meters (m²) |
| `planned_building_area_nearby_sectors` | The total planned building area on transacted land in nearby sectors | Square meters (m²) |
| `transaction_amount_nearby_sectors` | The total monetary value of land transactions in nearby sectors | 10,000 yuan |

---

## New House Transactions
**File:** `train/new_house_transactions.csv`

| Field Name | Description | Unit |
|------------|-------------|------|
| `month` | The month of the transaction | Date (YYYY-MM) |
| `sector` | The specific geographic sector where the new house transaction occurred | Text |
| `num_new_house_transactions` | The total number of new house transactions | Count |
| `area_new_house_transactions` | The total area of new house transactions | Square meters (m²) |
| `price_new_house_transactions` | The average price per square meter of new house transactions | Yuan per m² |
| `amount_new_house_transactions` | The total monetary value of new house transactions | 10,000 yuan |
| `area_per_unit_new_house_transactions` | The average area per new house transaction unit | m² per unit |
| `total_price_per_unit_new_house_transactions` | The average total price per new house transaction unit | 10,000 yuan per unit |
| `num_new_house_available_for_sale` | The total number of new houses available for sale | Count |
| `area_new_house_available_for_sale` | The total area of new houses available for sale | Square meters (m²) |
| `period_new_house_sell_through` | The estimated time to sell all available new houses | Months |

---

## New House Transactions (Nearby Sectors)
**File:** `train/new_house_transactions_nearby_sectors.csv`

| Field Name | Description | Unit |
|------------|-------------|------|
| `month` | The month of the transaction | Date (YYYY-MM) |
| `sector` | The specific geographic sector of interest | Text |
| `num_new_house_transactions_nearby_sectors` | The total number of new house transactions in nearby sectors | Count |
| `area_new_house_transactions_nearby_sectors` | The total area of new house transactions in nearby sectors | Square meters (m²) |
| `price_new_house_transactions_nearby_sectors` | The average price per square meter of new house transactions in nearby sectors | Yuan per m² |
| `amount_new_house_transactions_nearby_sectors` | The total monetary value of new house transactions in nearby sectors | 10,000 yuan |
| `area_per_unit_new_house_transactions_nearby_sectors` | The average area per new house transaction unit in nearby sectors | m² per unit |
| `total_price_per_unit_new_house_transactions_nearby_sectors` | The average total price per new house transaction unit in nearby sectors | 10,000 yuan per unit |
| `num_new_house_available_for_sale_nearby_sectors` | The total number of new houses available for sale in nearby sectors | Count |
| `area_new_house_available_for_sale_nearby_sectors` | The total area of new houses available for sale in nearby sectors | Square meters (m²) |
| `period_new_house_sell_through_nearby_sectors` | The estimated time to sell all available new houses in nearby sectors | Months |

---

## Sector POI (Points of Interest)
**File:** `train/sector_POI.csv`

### Basic Sector Information

| Field Name | Description | Unit |
|------------|-------------|------|
| `sector` | The specific geographic sector | Text |
| `sector_coverage` | The geographical extent or area covered by the sector | Area unit |
| `population_scale` | The general size of the population within the sector | Scale/Category |
| `residential_area` | The presence or extent of residential zones within the sector | Indicator |
| `office_building` | The presence or extent of office buildings within the sector | Indicator |
| `commercial_area` | The presence or extent of commercial zones within the sector | Indicator |

### Population Metrics

| Field Name | Description | Unit |
|------------|-------------|------|
| `resident_population` | The number of people residing in the sector | Count |
| `office_population` | The number of people working in offices within the sector | Count |

### Commercial and Retail

| Field Name | Description | Unit |
|------------|-------------|------|
| `number_of_shops` | The total count of shops in the sector | Count |
| `catering` | The number or density of catering establishments | Count |
| `retail` | The number or density of retail establishments | Count |
| `hotel` | The number or density of hotel establishments | Count |
| `rentable_shops` | The number of shops available for rent | Count |
| `surrounding_housing_average_price` | The average price of housing in the surrounding area | Yuan |
| `surrounding_shop_average_rent` | The average rent of shops in the surrounding area | Yuan |

### Transportation

| Field Name | Description | Unit |
|------------|-------------|------|
| `transportation_station` | The number or density of transportation stations | Count |
| `bus_station_cnt` | The count of bus stations | Count |
| `subway_station_cnt` | The count of subway stations | Count |
| `transportation_facilities_service_bus_station` | The presence or density of bus stations | Indicator |
| `transportation_facilities_service_subway_station` | The presence or density of subway stations | Indicator |
| `transportation_facilities_service_airport_related` | The presence or density of airport-related facilities | Indicator |
| `transportation_facilities_service_port_terminal` | The presence or density of port or terminal facilities | Indicator |
| `transportation_facilities_service_train_station` | The presence or density of train stations | Indicator |
| `transportation_facilities_service_light_rail_station` | The presence or density of light rail stations | Indicator |
| `transportation_facilities_service_long_distance_bus_station` | The presence or density of long-distance bus stations | Indicator |

### Education Facilities

| Field Name | Description | Unit |
|------------|-------------|------|
| `education` | The number or density of educational facilities | Count |
| `education_training_school_education_middle_school` | The number or density of middle schools | Count |
| `education_training_school_education_primary_school` | The number or density of primary schools | Count |
| `education_training_school_education_kindergarten` | The number or density of kindergartens | Count |
| `education_training_school_education_research_institution` | The number or density of research institutions | Count |

### Leisure and Entertainment

| Field Name | Description | Unit |
|------------|-------------|------|
| `leisure_and_entertainment` | The number or density of leisure and entertainment venues | Count |
| `leisure_entertainment_entertainment_venue_game_arcade` | The number or density of game arcades | Count |
| `leisure_entertainment_entertainment_venue_party_house` | The number or density of party houses | Count |
| `leisure_entertainment_cultural_venue_cultural_palace` | The number or density of cultural palaces | Count |

### Medical and Health Facilities

| Field Name | Description | Unit |
|------------|-------------|------|
| `medical_health` | The number or density of general medical and health facilities | Count |
| `medical_health_specialty_hospital` | The number or density of specialty hospitals | Count |
| `medical_health_tcm_hospital` | The number or density of Traditional Chinese Medicine (TCM) hospitals | Count |
| `medical_health_physical_examination_institution` | The number or density of physical examination institutions | Count |
| `medical_health_veterinary_station` | The number or density of veterinary stations | Count |
| `medical_health_pharmaceutical_healthcare` | The number or density of pharmaceutical healthcare providers | Count |
| `medical_health_rehabilitation_institution` | The number or density of rehabilitation institutions | Count |
| `medical_health_first_aid_center` | The number or density of first aid centers | Count |
| `medical_health_blood_donation_station` | The number or density of blood donation stations | Count |
| `medical_health_disease_prevention_institution` | The number or density of disease prevention institutions | Count |
| `medical_health_general_hospital` | The number or density of general hospitals | Count |
| `medical_health_clinic` | The number or density of clinics | Count |

### Office and Industrial Buildings

| Field Name | Description | Unit |
|------------|-------------|------|
| `office_building_industrial_building_industrial_building` | The number or density of industrial buildings used as office spaces | Count |

### Store Categories

| Field Name | Description | Unit |
|------------|-------------|------|
| `number_of_leisure_and_entertainment_stores` | The count of leisure and entertainment stores | Count |
| `number_of_other_stores` | The count of miscellaneous other stores | Count |
| `number_of_other_anchor_stores` | The count of other major or anchor stores | Count |
| `number_of_home_appliance_stores` | The count of home appliance stores | Count |
| `number_of_skincare_cosmetics_stores` | The count of skincare and cosmetics stores | Count |
| `number_of_fashion_stores` | The count of fashion stores | Count |
| `number_of_service_stores` | The count of service-oriented stores | Count |
| `number_of_jewelry_stores` | The count of jewelry stores | Count |
| `number_of_lifestyle_leisure_stores` | The count of lifestyle and leisure stores | Count |
| `number_of_supermarket_convenience_stores` | The count of supermarkets and convenience stores | Count |
| `number_of_catering_food_stores` | The count of catering and food stores | Count |

### Commercial Building Types

| Field Name | Description | Unit |
|------------|-------------|------|
| `number_of_residential_commercial` | The count of commercial establishments within residential areas | Count |
| `number_of_office_building_commercial` | The count of commercial establishments within office buildings | Count |
| `number_of_commercial_buildings` | The count of dedicated commercial buildings | Count |
| `number_of_hypermarkets` | The count of hypermarkets | Count |
| `number_of_department_stores` | The count of department stores | Count |
| `number_of_shopping_centers` | The count of shopping centers | Count |
| `number_of_hotel_commercial` | The count of commercial establishments within hotels | Count |

### Shopping Mall Classifications

| Field Name | Description | Unit |
|------------|-------------|------|
| `number_of_third_tier_shopping_malls_in_business_district` | The count of third-tier shopping malls within the business district | Count |
| `number_of_second_tier_shopping_malls_in_business_district` | The count of second-tier shopping malls within the business district | Count |
| `number_of_city_winner_malls` | The count of high-performing "city winner" malls | Count |
| `number_of_shopping_malls_with_street_facing_shops` | The count of shopping malls featuring street-facing shops | Count |
| `number_of_unranked_malls` | The count of shopping malls without a specific ranking | Count |
| `number_of_community_malls` | The count of community-focused malls | Count |
| `number_of_community_winner_malls` | The count of high-performing "community winner" malls | Count |
| `number_of_key_focus_malls` | The count of shopping malls identified for key focus | Count |

### Density Metrics

All density metrics represent the concentration or density of the corresponding feature within the sector coverage area.

#### Basic Density Indicators

| Field Name | Description | Unit |
|------------|-------------|------|
| `population_scale_dense` | The density of the population scale within the sector | Density measure |
| `residential_area_dense` | The density of residential areas within the sector | Density measure |
| `office_building_dense` | The density of office buildings within the sector | Density measure |
| `commercial_area_dense` | The density of commercial areas within the sector | Density measure |
| `resident_population_dense` | The density of the resident population within the sector | Density measure |
| `office_population_dense` | The density of the office population within the sector | Density measure |
| `number_of_shops_dense` | The density of shops within the sector | Density measure |
| `catering_dense` | The density of catering establishments within the sector | Density measure |
| `retail_dense` | The density of retail establishments within the sector | Density measure |
| `hotel_dense` | The density of hotel establishments within the sector | Density measure |
| `transportation_station_dense` | The density of transportation stations within the sector | Density measure |
| `education_dense` | The density of educational facilities within the sector | Density measure |
| `leisure_and_entertainment_dense` | The density of leisure and entertainment venues within the sector | Density measure |
| `bus_station_cnt_dense` | The density of bus stations | Density measure |
| `subway_station_cnt_dense` | The density of subway stations | Density measure |
| `rentable_shops_dense` | The density of rentable shops | Density measure |

#### Store Density Indicators

| Field Name | Description | Unit |
|------------|-------------|------|
| `leisure_entertainment_stores_dense` | The density of leisure and entertainment stores | Density measure |
| `other_stores_dense` | The density of miscellaneous other stores | Density measure |
| `other_anchor_stores_dense` | The density of other major or anchor stores | Density measure |
| `home_appliance_stores_dense` | The density of home appliance stores | Density measure |
| `skincare_cosmetics_stores_dense` | The density of skincare and cosmetics stores | Density measure |
| `fashion_stores_dense` | The density of fashion stores | Density measure |
| `service_stores_dense` | The density of service-oriented stores | Density measure |
| `jewelry_stores_dense` | The density of jewelry stores | Density measure |
| `lifestyle_leisure_stores_dense` | The density of lifestyle and leisure stores | Density measure |
| `supermarket_convenience_stores_dense` | The density of supermarkets and convenience stores | Density measure |
| `catering_food_stores_dense` | The density of catering and food stores | Density measure |

#### Commercial Building Density

| Field Name | Description | Unit |
|------------|-------------|------|
| `residential_commercial_dense` | The density of commercial establishments within residential areas | Density measure |
| `office_building_commercial_dense` | The density of commercial establishments within office buildings | Density measure |
| `commercial_buildings_dense` | The density of dedicated commercial buildings | Density measure |
| `hypermarkets_dense` | The density of hypermarkets | Density measure |
| `department_stores_dense` | The density of department stores | Density measure |
| `shopping_centers_dense` | The density of shopping centers | Density measure |
| `hotel_commercial_dense` | The density of commercial establishments within hotels | Density measure |

#### Mall Density

| Field Name | Description | Unit |
|------------|-------------|------|
| `third_tier_shopping_malls_in_business_district_dense` | The density of third-tier shopping malls within the business district | Density measure |
| `second_tier_shopping_malls_in_business_district_dense` | The density of second-tier shopping malls within the business district | Density measure |
| `city_winner_malls_dense` | The density of high-performing "city winner" malls | Density measure |
| `shopping_malls_with_street_facing_shops_dense` | The density of shopping malls featuring street-facing shops | Density measure |
| `unranked_malls_dense` | The density of shopping malls without a specific ranking | Density measure |
| `community_malls_dense` | The density of community-focused malls | Density measure |
| `community_winner_malls_dense` | The density of high-performing "community winner" malls | Density measure |
| `key_focus_malls_dense` | The density of shopping malls identified for key focus | Density measure |

#### Transportation Density

| Field Name | Description | Unit |
|------------|-------------|------|
| `transportation_facilities_service_bus_station_dense` | The density of bus stations | Density measure |
| `transportation_facilities_service_subway_station_dense` | The density of subway stations | Density measure |
| `transportation_facilities_service_airport_related_dense` | The density of airport-related facilities | Density measure |
| `transportation_facilities_service_port_terminal_dense` | The density of port or terminal facilities | Density measure |
| `transportation_facilities_service_train_station_dense` | The density of train stations | Density measure |
| `transportation_facilities_service_light_rail_station_dense` | The density of light rail stations | Density measure |
| `transportation_facilities_service_long_distance_bus_station_dense` | The density of long-distance bus stations | Density measure |

#### Entertainment and Culture Density

| Field Name | Description | Unit |
|------------|-------------|------|
| `leisure_entertainment_entertainment_venue_game_arcade_dense` | The density of game arcades | Density measure |
| `leisure_entertainment_entertainment_venue_party_house_dense` | The density of party houses | Density measure |
| `leisure_entertainment_cultural_venue_cultural_palace_dense` | The density of cultural palaces | Density measure |

#### Industrial and Office Density

| Field Name | Description | Unit |
|------------|-------------|------|
| `office_building_industrial_building_industrial_building_dense` | The density of industrial buildings used as office spaces | Density measure |

#### Medical Facility Density

| Field Name | Description | Unit |
|------------|-------------|------|
| `medical_health_dense` | The density of general medical and health facilities | Density measure |
| `medical_health_specialty_hospital_dense` | The density of specialty hospitals | Density measure |
| `medical_health_tcm_hospital_dense` | The density of Traditional Chinese Medicine (TCM) hospitals | Density measure |
| `medical_health_physical_examination_institution_dense` | The density of physical examination institutions | Density measure |
| `medical_health_veterinary_station_dense` | The density of veterinary stations | Density measure |
| `medical_health_pharmaceutical_healthcare_dense` | The density of pharmaceutical healthcare providers | Density measure |
| `medical_health_rehabilitation_institution_dense` | The density of rehabilitation institutions | Density measure |
| `medical_health_first_aid_center_dense` | The density of first aid centers | Density measure |
| `medical_health_blood_donation_station_dense` | The density of blood donation stations | Density measure |
| `medical_health_disease_prevention_institution_dense` | The density of disease prevention institutions | Density measure |
| `medical_health_general_hospital_dense` | The density of general hospitals | Density measure |
| `medical_health_clinic_dense` | The density of clinics | Density measure |

#### Education Facility Density

| Field Name | Description | Unit |
|------------|-------------|------|
| `education_training_school_education_middle_school_dense` | The density of middle schools | Density measure |
| `education_training_school_education_primary_school_dense` | The density of primary schools | Density measure |
| `education_training_school_education_kindergarten_dense` | The density of kindergartens | Density measure |
| `education_training_school_education_research_institution_dense` | The density of research institutions | Density measure |

---

## City Search Index
**File:** `train/city_search_index.csv`

| Field Name | Description | Unit |
|------------|-------------|------|
| `month` | The month the search data was recorded | Date (YYYY-MM) |
| `keyword` | The specific search term | Text |
| `source` | The origin or platform of the search data | Text |
| `search_volume` | The total number of searches for the keyword | Count |

---

## City Indexes
**File:** `train/city_indexes.csv`

### Population Demographics

| Field Name | Description | Unit |
|------------|-------------|------|
| `city_indicator_data_year` | The year to which the city indicator data pertains | Year (YYYY) |
| `year_end_registered_population_10k` | The registered population at year-end | 10,000 persons |
| `total_households_10k` | The total number of households | 10,000 households |
| `year_end_resident_population_10k` | The permanent resident population at year-end | 10,000 persons |
| `national_year_end_total_population_10k` | The national total population at year-end | 10,000 persons |
| `resident_registered_ratio` | The ratio of permanent residents to registered population | Ratio |
| `under_18_10k` | The population under 18 years old | 10,000 persons |
| `18_60_years_10k` | The population aged 18 to 60 years old | 10,000 persons |
| `over_60_years_10k` | The population over 60 years old | 10,000 persons |
| `total` | The total population count | Count |
| `under_18_percent` | The percentage of the population under 18 years old | Percentage |
| `18_60_years_percent` | The percentage of the population aged 18 to 60 years old | Percentage |
| `over_60_years_percent` | The percentage of the population over 60 years old | Percentage |
| `national_population_share` | The city's share of the national population | Percentage/Ratio |

### Employment

| Field Name | Description | Unit |
|------------|-------------|------|
| `year_end_total_employed_population_10k` | The total employed population at year-end | 10,000 persons |
| `year_end_urban_non_private_employees_10k` | The number of urban non-private unit employees at year-end | 10,000 persons |
| `private_individual_and_other_employees_10k` | The number of private, individual, and other employees | 10,000 persons |
| `private_individual_ratio` | The proportion of private and individual employees | Ratio |
| `employed_population` | The total number of employed individuals | Count |
| `primary_industry_percent` | The percentage of the employed population in the primary industry | Percentage |
| `secondary_industry_percent` | The percentage of the employed population in the secondary industry | Percentage |
| `tertiary_industry_percent` | The percentage of the employed population in the tertiary industry | Percentage |
| `white_collar_service_vs_blue_collar_manufacturing_ratio` | The ratio of white-collar (service industry) to blue-collar (manufacturing industry) population | Ratio |

### Economic Indicators

| Field Name | Description | Unit |
|------------|-------------|------|
| `gdp_100m` | The Gross Domestic Product (GDP) | 100 million yuan |
| `primary_industry_100m` | The output value of the primary industry | 100 million yuan |
| `secondary_industry_100m` | The output value of the secondary industry | 100 million yuan |
| `tertiary_industry_100m` | The output value of the tertiary industry | 100 million yuan |
| `gdp_per_capita_yuan` | The GDP per capita | Yuan |
| `national_gdp_100m` | The national GDP | 100 million yuan |
| `national_economic_primacy` | An indicator of the city's economic dominance compared to the nation | Indicator |
| `gdp_population_ratio` | The ratio of the city's GDP primacy to its national population share | Ratio |
| `secondary_industry_development_gdp_share` | The share of the secondary industry in the GDP, indicating its development | Percentage |
| `tertiary_industry_development_gdp_share` | The share of the tertiary industry in the GDP, indicating its development | Percentage |

### Government Finance

| Field Name | Description | Unit |
|------------|-------------|------|
| `general_public_budget_revenue_100m` | The general public budget revenue | 100 million yuan |
| `personal_income_tax_100m` | The personal income tax collected | 100 million yuan |
| `per_capita_personal_income_tax_yuan` | The per capita personal income tax | Yuan |
| `general_public_budget_expenditure_100m` | The general public budget expenditure | 100 million yuan |
| `science_expenditure_10k` | The expenditure on science | 10,000 yuan |
| `education_expenditure_10k` | The expenditure on education | 10,000 yuan |

### Consumer Metrics

| Field Name | Description | Unit |
|------------|-------------|------|
| `total_retail_sales_of_consumer_goods_100m` | The total retail sales of consumer goods | 100 million yuan |
| `retail_sales_growth_rate` | The growth rate of retail sales | Percentage |
| `urban_consumer_price_index_previous_year_100` | The urban consumer price index, with the previous year set as 100 | Index (base=100) |
| `engel_coefficient` | The Engel coefficient, indicating the proportion of income spent on food | Coefficient |

### Income and Wages

| Field Name | Description | Unit |
|------------|-------------|------|
| `annual_average_wage_urban_non_private_employees_yuan` | The annual average wage of urban non-private unit employees | Yuan |
| `annual_average_wage_urban_non_private_on_duty_employees_yuan` | The annual average wage of urban non-private on-duty employees | Yuan |
| `per_capita_disposable_income_absolute_yuan` | The absolute value of per capita disposable income | Yuan |
| `per_capita_disposable_income_index_previous_year_100` | The per capita disposable income index, with the previous year set as 100 | Index (base=100) |

### Housing

| Field Name | Description | Unit |
|------------|-------------|------|
| `per_capita_housing_area_sqm` | The per capita housing area | Square meters (m²) |

### Education Infrastructure

| Field Name | Description | Unit |
|------------|-------------|------|
| `number_of_universities` | The total count of universities and colleges | Count |
| `university_students_10k` | The number of university students | 10,000 students |
| `number_of_middle_schools` | The total count of middle schools | Count |
| `middle_school_students_10k` | The number of middle school students | 10,000 students |
| `number_of_primary_schools` | The total count of primary schools | Count |
| `primary_school_students_10k` | The number of primary school students | 10,000 students |
| `number_of_kindergartens` | The total count of kindergartens | Count |
| `kindergarten_students_10k` | The number of kindergarten students | 10,000 students |

### Healthcare Infrastructure

| Field Name | Description | Unit |
|------------|-------------|------|
| `hospitals_health_centers` | The total count of hospitals and health centers | Count |
| `hospital_beds_10k` | The number of hospital beds | 10,000 beds |
| `health_technical_personnel_10k` | The number of health technical personnel | 10,000 persons |
| `doctors_10k` | The number of doctors | 10,000 doctors |

### Transportation Infrastructure

| Field Name | Description | Unit |
|------------|-------------|------|
| `road_length_km` | The total length of roads | Kilometers (km) |
| `road_area_10k_sqm` | The total area of roads | 10,000 square meters |
| `per_capita_urban_road_area_sqm` | The per capita urban road area | Square meters (m²) |
| `number_of_operating_bus_lines` | The total count of operating bus lines | Count |
| `operating_bus_line_length_km` | The total length of operating bus lines | Kilometers (km) |

### Internet Connectivity

| Field Name | Description | Unit |
|------------|-------------|------|
| `internet_broadband_access_subscribers_10k` | The number of internet broadband access subscribers | 10,000 subscribers |
| `internet_broadband_access_ratio` | The ratio of internet broadband access | Ratio/Percentage |

### Industrial and Investment

| Field Name | Description | Unit |
|------------|-------------|------|
| `number_of_industrial_enterprises_above_designated_size` | The total count of industrial enterprises above a designated size | Count |
| `total_current_assets_10k` | The total current assets | 10,000 yuan |
| `total_fixed_assets_10k` | The total fixed assets | 10,000 yuan |
| `main_business_taxes_and_surcharges_10k` | The main business taxes and surcharges | 10,000 yuan |
| `total_fixed_asset_investment_10k` | The total fixed asset investment | 10,000 yuan |
| `real_estate_development_investment_completed_10k` | The completed real estate development investment | 10,000 yuan |
| `residential_development_investment_completed_10k` | The completed residential development investment | 10,000 yuan |

---

## Notes and Conventions

### Monetary Units
- **10,000 yuan**: Values are expressed in units of ten thousand Chinese yuan (万元)
- **100 million yuan**: Values are expressed in units of one hundred million Chinese yuan (亿元)
- **Yuan**: Standard Chinese currency unit (¥)

### Population Units
- **10,000 persons**: Population counts expressed in units of ten thousand people (万人)

### Area Units
- **Square meters (m²)**: Standard metric unit for area measurements
- **10,000 square meters**: Larger areas expressed in units of ten thousand square meters

### Density Measures
Density metrics in the Sector POI dataset represent normalized values that account for the sector coverage area. These allow for fair comparison across sectors of different sizes.

### Date Formats
- **Month**: Typically formatted as YYYY-MM (e.g., 2023-01)
- **Year**: Four-digit year (e.g., 2023)

### Data Relationships
- **Primary datasets** contain transaction or indicator data for specific sectors
- **Nearby sectors datasets** aggregate data from geographically adjacent sectors to capture spillover effects and regional trends
- **POI (Points of Interest)** data provides detailed infrastructure and amenity information at the sector level
- **City-level indexes** provide macro-economic and demographic context

### Key Concepts

**Sector**: A geographic subdivision used for spatial analysis of real estate and urban characteristics.

**Pre-owned Houses**: Previously occupied residential properties being resold (also known as second-hand or resale housing).

**New Houses**: Newly constructed residential properties being sold for the first time.

**Land Transactions**: Sales of land parcels, typically for future development.

**Nearby Sectors**: Adjacent or surrounding geographic sectors used to capture regional market dynamics.

**POI (Points of Interest)**: Locations of various facilities, services, and amenities within a sector.

**Density Metrics**: Normalized measures that account for the geographic size of sectors, enabling fair comparisons.

---

## Data Quality Considerations

When working with these datasets:

1. **Missing Values**: Some sectors may have missing data for certain time periods or metrics
2. **Temporal Coverage**: Different datasets may have different time ranges
3. **Aggregation Levels**: Pay attention to whether data is aggregated (totals) or averaged
4. **Unit Consistency**: Always verify the unit of measurement when performing calculations
5. **Seasonal Patterns**: Real estate transactions often exhibit seasonal variations
6. **Regional Variations**: Different sectors may have vastly different characteristics based on urban vs suburban locations

---

## Common Analysis Use Cases

This data dictionary supports various analytical tasks:

- **Price Prediction**: Forecasting real estate prices using historical transactions and POI features
- **Market Segmentation**: Identifying distinct real estate market clusters based on sector characteristics
- **Trend Analysis**: Tracking temporal patterns in transactions, prices, and inventory
- **Spatial Analysis**: Understanding how nearby sector characteristics influence local markets
- **Demand Forecasting**: Predicting future transaction volumes and inventory needs
- **Infrastructure Planning**: Analyzing the relationship between amenities and real estate activity
- **Economic Impact Assessment**: Evaluating how city-level economic indicators affect local markets

---

In [9]:
# Core Libraries
import numpy as np
import pandas as pd
import warnings
import os
import polars as pl
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Styling
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11
COLORS = ['#2E86AB', '#A23B72', '#F18F01', '#C73E1D', '#6A994E']

# Statistical Analysis
from scipy import stats
from scipy.stats import chi2_contingency, pearsonr

print("✅ All libraries imported successfully!")
print(f"📦 Pandas version: {pd.__version__}")
print(f"📦 NumPy version: {np.__version__}")
print(f"📦 Polars version: {pl.__version__}")

✅ All libraries imported successfully!
📦 Pandas version: 2.2.2
📦 NumPy version: 2.0.2
📦 Polars version: 1.25.2


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
import os

currPath = os.getcwd()
#print(f"Current directory before changing: {currPath}")

os.chdir("/content/drive/MyDrive/china-real-estate-demand-prediction/train")
#print(f"Current directory after changing: {os.getcwd()}")

dir_list = os.listdir(os.getcwd())
#print("Files in the current directory:")
print(dir_list)
print(len(dir_list))

['land_transactions.csv', 'new_house_transactions.csv', 'sector_POI.csv', 'pre_owned_house_transactions.csv', 'new_house_transactions_nearby_sectors.csv', 'land_transactions_nearby_sectors.csv', 'city_search_index.csv', 'pre_owned_house_transactions_nearby_sectors.csv', 'city_indexes.csv']
9


In [4]:
print("✓ Pre requisites added")
month_codes = {
    1: 'Jan',
    2: 'Feb',
    3: 'Mar',
    4: 'Apr',
    5: 'May',
    6: 'Jun',
    7: 'Jul',
    8: 'Aug',
    9: 'Sep',
    10: 'Oct',
    11: 'Nov',
    12: 'Dec'
}

✓ Pre requisites added


In [10]:
for i in dir_list:
    df = pd.read_csv(os.path.join(os.getcwd(), i))
    print(f"\n{'='*50}")
    print(f"File: {i}")
    print(f"{'='*50}")

    # if 'month' in df.columns:
    #     # Convert 'month' to datetime objects
    #     df['month'] = pd.to_datetime(df['month'])

    #     # Extract year and month abbreviation
    #     year_values = df['month'].dt.year.astype(str)  # Convert year to string
    #     month_values = df['month'].dt.month.map(month_codes)

    #     # Create the 'id' column
    #     df.insert(0, 'id', year_values + ' ' + month_values)

    #     df.drop('month', axis=1, inplace=True)


    display(df.head(10))
    print(f"\nShape: {df.shape}")

print("✓ Pre requisites added") # This line seems out of place here, it was probably intended for the setup cells.


File: land_transactions.csv


Unnamed: 0,month,sector,num_land_transactions,construction_area,planned_building_area,transaction_amount
0,2019-Jan,sector 74,0,0.0,0.0,0.0
1,2019-Jan,sector 35,0,0.0,0.0,0.0
2,2019-Jan,sector 23,0,0.0,0.0,0.0
3,2019-Jan,sector 80,0,0.0,0.0,0.0
4,2019-Jan,sector 53,0,0.0,0.0,0.0
5,2019-Jan,sector 84,0,0.0,0.0,0.0
6,2019-Jan,sector 16,0,0.0,0.0,0.0
7,2019-Jan,sector 27,0,0.0,0.0,0.0
8,2019-Jan,sector 82,0,0.0,0.0,0.0
9,2019-Jan,sector 14,0,0.0,0.0,0.0



Shape: (5896, 6)

File: new_house_transactions.csv


Unnamed: 0,month,sector,num_new_house_transactions,area_new_house_transactions,price_new_house_transactions,amount_new_house_transactions,area_per_unit_new_house_transactions,total_price_per_unit_new_house_transactions,num_new_house_available_for_sale,area_new_house_available_for_sale,period_new_house_sell_through
0,2019-Jan,sector 1,52,4906,28184,13827.14,94,265.91,159.0,15904.0,3.78
1,2019-Jan,sector 2,145,15933,17747,28277.73,110,195.02,1491.0,175113.0,12.29
2,2019-Jan,sector 4,6,725,28004,1424.21,127,356.05,40.0,6826.0,5.95
3,2019-Jan,sector 5,2,212,37432,792.1,106,396.05,161.0,17173.0,83.95
4,2019-Jan,sector 6,5,773,15992,607.94,95,151.99,189.0,19696.0,14.27
5,2019-Jan,sector 7,93,12678,31018,39326.1,136,422.86,867.0,124626.0,7.74
6,2019-Jan,sector 8,51,7183,14555,10454.68,141,204.99,1719.0,261360.0,40.71
7,2019-Jan,sector 9,13,840,52694,4170.55,66,347.55,95.0,14612.0,8.36
8,2019-Jan,sector 10,40,3629,30431,11043.29,91,276.08,209.0,19380.0,1.73
9,2019-Jan,sector 11,80,8935,11871,10606.75,112,132.58,602.0,118890.0,16.06



Shape: (5433, 11)

File: sector_POI.csv


Unnamed: 0,sector,sector_coverage,population_scale,residential_area,office_building,commercial_area,resident_population,office_population,number_of_shops,catering,...,medical_health_rehabilitation_institution_dense,medical_health_first_aid_center_dense,medical_health_blood_donation_station_dense,medical_health_disease_prevention_institution_dense,medical_health_general_hospital_dense,medical_health_clinic_dense,education_training_school_education_middle_school_dense,education_training_school_education_primary_school_dense,education_training_school_education_kindergarten_dense,education_training_school_education_research_institution_dense
0,sector 23,0.410668,113800,68,58,4,88000,36000,2398,597,...,2.43e-05,0.0,0.0,0.0,7.29e-06,2.55e-05,1.22e-06,6.08e-06,1.34e-05,3.65e-06
1,sector 80,0.426227,74388,23,9,0,37173,53683,837,277,...,1.34e-06,0.0,0.0,1.91e-07,1.15e-06,5.74e-07,3.83e-07,3.83e-07,1.72e-06,2.49e-06
2,sector 84,0.448303,146800,136,62,1,118000,40000,2776,697,...,1.5e-05,7.48e-07,0.0,7.48e-07,5.98e-06,1.27e-05,5.23e-06,4.49e-06,9.72e-06,5.23e-06
3,sector 16,0.688514,1435300,2542,434,31,1023000,589000,62837,11222,...,0.000149372,0.0,0.0,7.47e-05,9.96e-05,0.000360982,2.49e-05,4.98e-05,0.000323639,0.000124477
4,sector 27,0.366972,40600,24,22,1,22000,27000,805,226,...,4.58e-06,0.0,0.0,0.0,3.66e-06,9.15e-07,0.0,9.15e-07,4.58e-06,9.15e-07
5,sector 82,1.762917,183900,115,89,9,131000,76000,3470,1216,...,3.78e-05,0.0,0.0,2.91e-06,1.16e-05,3.05e-05,1.31e-05,1.02e-05,1.89e-05,1.6e-05
6,sector 14,0.697386,158000,77,19,17,138000,29000,5933,728,...,7.03e-06,1e-06,0.0,3.01e-06,5.02e-06,8.03e-06,8.03e-06,5.02e-06,2.11e-05,2.11e-05
7,sector 6,0.590093,247389,98,60,16,216373,48079,9322,2514,...,4.64e-07,0.0,0.0,0.0,6.86e-07,7.6e-07,9.27e-08,3.34e-07,5.75e-07,1.11e-07
8,sector 66,0.08644,40100,8,11,0,33000,10000,198,78,...,0.0,0.0,0.0,7.38e-07,0.0,7.38e-07,0.0,0.0,2.21e-06,7.38e-07
9,sector 42,0.072261,57058,45,0,1,57058,0,1089,273,...,3.91e-06,0.0,0.0,6.52e-07,1.96e-06,2.61e-06,1.04e-05,6.52e-06,1.17e-05,0.0



Shape: (86, 142)

File: pre_owned_house_transactions.csv


Unnamed: 0,month,sector,area_pre_owned_house_transactions,amount_pre_owned_house_transactions,num_pre_owned_house_transactions,price_pre_owned_house_transactions
0,2019-Jan,sector 35,548,33.2,6,605.839416
1,2019-Jan,sector 23,3376,9137.764,48,27066.83649
2,2019-Jan,sector 80,3804,8980.0,44,23606.72976
3,2019-Jan,sector 53,0,0.0,0,11301.43941
4,2019-Jan,sector 84,9941,51515.904,103,51821.65175
5,2019-Jan,sector 16,970,3661.0,10,37742.26804
6,2019-Jan,sector 27,2040,3067.7,23,15037.7451
7,2019-Jan,sector 82,436,2487.4,6,57050.45872
8,2019-Jan,sector 14,2567,8262.3,36,32186.59914
9,2019-Jan,sector 6,388,993.3,4,25600.51546



Shape: (5360, 6)

File: new_house_transactions_nearby_sectors.csv


Unnamed: 0,month,sector,num_new_house_transactions_nearby_sectors,area_new_house_transactions_nearby_sectors,price_new_house_transactions_nearby_sectors,amount_new_house_transactions_nearby_sectors,area_per_unit_new_house_transactions_nearby_sectors,total_price_per_unit_new_house_transactions_nearby_sectors,num_new_house_available_for_sale_nearby_sectors,area_new_house_available_for_sale_nearby_sectors,period_new_house_sell_through_nearby_sectors
0,2019-Jan,sector 35,129.25,13212.5,21172.85714,27974.6375,102.224371,216.438201,2526.75,302828.5,21.91
1,2019-Jan,sector 23,27.4,2822.4,47592.19459,13432.421,103.007299,490.234343,390.6,47866.6,11.15
2,2019-Jan,sector 80,81.285714,8670.0,25508.56484,22115.92571,106.660808,272.076415,1124.285714,131539.7143,10.467143
3,2019-Jan,sector 53,28.5,3428.833333,39242.77451,13455.69333,120.309941,472.129591,350.5,43073.66667,15.613333
4,2019-Jan,sector 84,8.857143,1304.428571,62359.84558,8134.396429,147.274193,918.399597,207.4,35174.0,26.246
5,2019-Jan,sector 16,33.8,4161.5,49146.30061,20452.233,123.121302,605.095651,467.7,63791.5,29.495
6,2019-Jan,sector 27,25.571429,3118.285714,32495.87227,10133.14143,121.944134,396.268101,598.666667,78131.0,44.193333
7,2019-Jan,sector 82,8.4,1062.5,63916.02824,6791.078,126.488095,808.461667,152.5,24092.33333,10.966667
8,2019-Jan,sector 14,27.222222,3153.222222,58035.0999,18299.75667,115.832653,672.235959,367.0,49988.375,16.4925
9,2019-Jan,sector 6,140.888889,15084.66667,20778.66045,31343.91667,107.067823,222.472595,1613.888889,195501.5556,11.54



Shape: (5360, 11)

File: land_transactions_nearby_sectors.csv


Unnamed: 0,month,sector,num_land_transactions_nearby_sectors,construction_area_nearby_sectors,planned_building_area_nearby_sectors,transaction_amount_nearby_sectors
0,2019-Jan,sector 35,0.0,0.0,0.0,0.0
1,2019-Jan,sector 23,0.0,0.0,0.0,0.0
2,2019-Jan,sector 80,0.0,0.0,0.0,0.0
3,2019-Jan,sector 53,0.0,0.0,0.0,0.0
4,2019-Jan,sector 84,0.0,0.0,0.0,0.0
5,2019-Jan,sector 16,0.0,0.0,0.0,0.0
6,2019-Jan,sector 27,0.0,0.0,0.0,0.0
7,2019-Jan,sector 82,0.0,0.0,0.0,0.0
8,2019-Jan,sector 14,0.0,0.0,0.0,0.0
9,2019-Jan,sector 6,0.0,0.0,0.0,0.0



Shape: (5025, 6)

File: city_search_index.csv


Unnamed: 0,month,keyword,source,search_volume
0,2019-Jan,买房,PC端,1914
1,2019-Jan,买房,移动端,2646
2,2019-Jan,二手房市场,PC端,192
3,2019-Jan,二手房市场,移动端,204
4,2019-Jan,公积金,PC端,9160
5,2019-Jan,公积金,移动端,6925
6,2019-Jan,利率上调,PC端,80
7,2019-Jan,利率上调,移动端,0
8,2019-Jan,去库存,PC端,800
9,2019-Jan,去库存,移动端,625



Shape: (4020, 4)

File: pre_owned_house_transactions_nearby_sectors.csv


Unnamed: 0,month,sector,num_pre_owned_house_transactions_nearby_sectors,area_pre_owned_house_transactions_nearby_sectors,amount_pre_owned_house_transactions_nearby_sectors,price_pre_owned_house_transactions_nearby_sectors
0,2019-Jan,sector 1,6.75,733.0,1247.038,17012.79673
1,2019-Jan,sector 2,64.181818,5339.0,20880.24282,39108.90208
2,2019-Jan,sector 3,77.714286,7457.142857,17376.21486,23301.43755
3,2019-Jan,sector 4,57.666667,5109.666667,19021.15267,37225.81904
4,2019-Jan,sector 5,45.428571,3763.5,15800.74143,41984.16747
5,2019-Jan,sector 6,38.3,3201.5,10070.514,31455.61143
6,2019-Jan,sector 7,41.428571,3914.714286,9200.657143,23502.75517
7,2019-Jan,sector 8,57.5,4886.9,20190.8625,41316.2997
8,2019-Jan,sector 9,53.222222,4677.0,19835.21844,42410.13138
9,2019-Jan,sector 10,55.777778,5630.444444,9644.433333,17129.08



Shape: (5427, 6)

File: city_indexes.csv


Unnamed: 0,city_indicator_data_year,year_end_registered_population_10k,total_households_10k,year_end_resident_population_10k,year_end_total_employed_population_10k,year_end_urban_non_private_employees_10k,private_individual_and_other_employees_10k,private_individual_ratio,national_year_end_total_population_10k,resident_registered_ratio,...,internet_broadband_access_ratio,number_of_industrial_enterprises_above_designated_size,total_current_assets_10k,total_fixed_assets_10k,main_business_taxes_and_surcharges_10k,total_fixed_asset_investment_10k,real_estate_development_investment_completed_10k,residential_development_investment_completed_10k,science_expenditure_10k,education_expenditure_10k
0,2017,897.87,295.0211,1449.84,862.33,329.17,708.0004,0.821032,139008,1.614755,...,0.580841,4664,96779199,39013240.0,4012517.0,,27028935.0,17694861.0,1712569.0,4043335.0
1,2018,927.69,305.9851,1490.44,1102.36,348.65,753.7146,0.683728,139538,1.606614,...,0.586586,4675,99067099,,4297060.0,,27019323.0,17466618.0,1636655.0,4408209.0
2,2019,953.72,313.8455,1530.59,1125.89,400.22,725.672,0.644532,140005,1.604863,...,0.593287,5804,110643480,39934058.0,4862051.0,,31022573.0,20870742.0,2439456.0,5239743.0
3,2020,985.11,322.1068,1874.03,1158.01,419.36,738.6462,0.637858,141212,1.902356,...,0.629148,6208,123664111,,,,,,2241321.0,5585916.0
4,2021,1011.53,329.7224,1881.06,1163.44,426.94,736.5017,0.633038,141260,1.859619,...,0.686762,6757,142064810,,,,,,2012478.0,5890857.0
5,2022,1034.91,336.1935,1873.41,1119.82,424.9,694.9159,0.62056,141175,1.810215,...,0.736798,6878,159482630,,,,,,1982136.0,6269391.0
6,2022,1034.91,336.1935,1873.41,1119.82,424.9,694.9159,0.62056,141175,1.810215,...,0.736798,6878,159482630,,,,,,,



Shape: (7, 74)
✓ Pre requisites added


In [6]:
for i in dir_list:
    df = pd.read_csv(os.path.join(os.getcwd(), i))
    print(f"\n{'='*60}")
    print(f"📁 File: {i}")
    print(f"{'='*60}")
    print(f"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}")
    print(f"Total null values: {df.isnull().sum().sum():,}")
    print(f"Duplicated rows: {df.duplicated().sum():,}")
    print(f"\nData Types:")
    for dtype, count in df.dtypes.value_counts().items():
        print(f"  {dtype}: {count} columns")

    null_cols = df.isnull().sum()
    if null_cols.sum() > 0:
        print(f"\nColumns with null values:")
        for col, count in null_cols[null_cols > 0].items():
            print(f"  {col}: {count:,} ({count/len(df)*100:.1f}%)")


📁 File: land_transactions.csv
Rows: 5,896 | Columns: 6
Total null values: 0
Duplicated rows: 0

Data Types:
  float64: 3 columns
  object: 2 columns
  int64: 1 columns

📁 File: new_house_transactions.csv
Rows: 5,433 | Columns: 11
Total null values: 42
Duplicated rows: 0

Data Types:
  float64: 5 columns
  int64: 4 columns
  object: 2 columns

Columns with null values:
  num_new_house_available_for_sale: 14 (0.3%)
  area_new_house_available_for_sale: 14 (0.3%)
  period_new_house_sell_through: 14 (0.3%)

📁 File: sector_POI.csv
Rows: 86 | Columns: 142
Total null values: 7
Duplicated rows: 0

Data Types:
  int64: 73 columns
  float64: 68 columns
  object: 1 columns

Columns with null values:
  surrounding_housing_average_price: 4 (4.7%)
  surrounding_shop_average_rent: 3 (3.5%)

📁 File: pre_owned_house_transactions.csv
Rows: 5,360 | Columns: 6
Total null values: 14
Duplicated rows: 0

Data Types:
  object: 2 columns
  int64: 2 columns
  float64: 2 columns

Columns with null values:
  pric

# **Group all tables into one for train.csv**

In [42]:
base_path = os.getcwd()

new_house_transactions = pd.read_csv(os.path.join(base_path, 'new_house_transactions.csv'))
new_house_transactions_nearby_sectors = pd.read_csv(os.path.join(base_path, 'new_house_transactions_nearby_sectors.csv'))
pre_owned_house_transactions = pd.read_csv(os.path.join(base_path, 'pre_owned_house_transactions.csv'))
pre_owned_house_transactions_nearby_sectors = pd.read_csv(os.path.join(base_path, 'pre_owned_house_transactions_nearby_sectors.csv'))
city_search_index = pd.read_csv(os.path.join(base_path, 'city_search_index.csv'))
sector_POI = pd.read_csv(os.path.join(base_path, 'sector_POI.csv'))
land_transactions = pd.read_csv(os.path.join(base_path, 'land_transactions.csv'))
land_transactions_nearby_sectors = pd.read_csv(os.path.join(base_path, 'land_transactions_nearby_sectors.csv'))
city_indexes = pd.read_csv(os.path.join(base_path, 'city_indexes.csv'))

print("✓ All datasets loaded successfully")

✓ All datasets loaded successfully


In [113]:
# Merge dataframes based on sectors
data1 = (
    pl.DataFrame(
        new_house_transactions['month'].unique()
    )
    .rename(
        {
            'column_0': 'month'
        }
    )
    .join(
        pl.DataFrame(list(new_house_transactions["sector"].unique()) + ["sector 95"])
        .rename({"column_0": "sector"}),
        how="cross",
    )
    .fill_null(-1)
    .join(
        pl.DataFrame(new_house_transactions),
        on=['month', 'sector'],
        how='left'
    )
    .fill_null(-1)
    .join(
        pl.DataFrame(new_house_transactions_nearby_sectors),
        on=['month', 'sector'],
        how='left'
    )
    .fill_null(-1)
    .join(
        pl.DataFrame(pre_owned_house_transactions),
        on=['month', 'sector'],
        how='left'
    )
    .join(
        pl.DataFrame(pre_owned_house_transactions_nearby_sectors),
        on=['month', 'sector'],
        how='left'
    )
    .fill_null(-1)
    .join(
        pl.DataFrame(land_transactions),
        on=['month', 'sector'],
        how='left'
    )
    .fill_null(-1)
    .join(
        pl.DataFrame(land_transactions_nearby_sectors),
        on=['month', 'sector'],
        how='left'
    )
    .fill_null(-1)
    .with_columns(
        pl.col('month').str.split('-').list.get(0).cast(pl.Int16).alias('year'),
    )
    .join(
        pl.DataFrame(city_indexes).rename({"city_indicator_data_year": "year"}),
        on=["year"],
        how="left"
    )
    .fill_null(-1)
    .join(pl.DataFrame(sector_POI), on=["sector"], how="left")
    .fill_null(-1)
    .with_columns(
        id = pl.col('month').str.split('-').list.get(0).cast(pl.Int16).cast(pl.Utf8) + ' ' + pl.col('month').str.split('-').list.get(1).cast(pl.Utf8) + '_' + pl.col('sector').cast(pl.Utf8)
    )
    .sort(['id'])
    .select(['id', pl.col('*').exclude('id')])
    .drop(['month','sector'])
)

#data1.null_count() - to check null counts

In [117]:
for col in data1.columns:
    if data1[col].dtype == pl.Int64 or data1[col].dtype == pl.Float64:
        c_min, c_max = data1[col].min(), data1[col].max()

        if c_min == 0 and c_max == 0:
            data1 = data1.drop(col)
            print(col, "0" * 20)
            continue

In [118]:
data1

id,num_new_house_transactions,area_new_house_transactions,price_new_house_transactions,amount_new_house_transactions,area_per_unit_new_house_transactions,total_price_per_unit_new_house_transactions,num_new_house_available_for_sale,area_new_house_available_for_sale,period_new_house_sell_through,num_new_house_transactions_nearby_sectors,area_new_house_transactions_nearby_sectors,price_new_house_transactions_nearby_sectors,amount_new_house_transactions_nearby_sectors,area_per_unit_new_house_transactions_nearby_sectors,total_price_per_unit_new_house_transactions_nearby_sectors,num_new_house_available_for_sale_nearby_sectors,area_new_house_available_for_sale_nearby_sectors,period_new_house_sell_through_nearby_sectors,area_pre_owned_house_transactions,amount_pre_owned_house_transactions,num_pre_owned_house_transactions,price_pre_owned_house_transactions,num_pre_owned_house_transactions_nearby_sectors,area_pre_owned_house_transactions_nearby_sectors,amount_pre_owned_house_transactions_nearby_sectors,price_pre_owned_house_transactions_nearby_sectors,num_land_transactions,construction_area,planned_building_area,transaction_amount,num_land_transactions_nearby_sectors,construction_area_nearby_sectors,planned_building_area_nearby_sectors,transaction_amount_nearby_sectors,year,year_end_registered_population_10k,…,shopping_centers_dense,hotel_commercial_dense,third_tier_shopping_malls_in_business_district_dense,second_tier_shopping_malls_in_business_district_dense,city_winner_malls_dense,shopping_malls_with_street_facing_shops_dense,unranked_malls_dense,community_malls_dense,community_winner_malls_dense,key_focus_malls_dense,transportation_facilities_service_bus_station_dense,transportation_facilities_service_subway_station_dense,transportation_facilities_service_airport_related_dense,transportation_facilities_service_port_terminal_dense,transportation_facilities_service_train_station_dense,transportation_facilities_service_light_rail_station_dense,transportation_facilities_service_long_distance_bus_station_dense,leisure_entertainment_entertainment_venue_game_arcade_dense,leisure_entertainment_entertainment_venue_party_house_dense,leisure_entertainment_cultural_venue_cultural_palace_dense,office_building_industrial_building_industrial_building_dense,medical_health_dense,medical_health_specialty_hospital_dense,medical_health_tcm_hospital_dense,medical_health_physical_examination_institution_dense,medical_health_veterinary_station_dense,medical_health_pharmaceutical_healthcare_dense,medical_health_rehabilitation_institution_dense,medical_health_first_aid_center_dense,medical_health_blood_donation_station_dense,medical_health_disease_prevention_institution_dense,medical_health_general_hospital_dense,medical_health_clinic_dense,education_training_school_education_middle_school_dense,education_training_school_education_primary_school_dense,education_training_school_education_kindergarten_dense,education_training_school_education_research_institution_dense
str,i64,i64,i64,f64,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,i64,f64,i64,f64,f64,f64,f64,f64,i64,f64,f64,f64,f64,f64,f64,f64,i16,f64,…,f64,f64,f64,f64,f64,i64,f64,f64,f64,i64,f64,f64,f64,f64,f64,i64,f64,f64,f64,f64,i64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""2019 Apr_sector 1""",69,6935,38392,26626.68,101,385.89,141.0,12936.0,2.83,25.0,2611.777778,51527.35897,13457.80111,104.471111,538.312044,300.125,41819.875,16.21375,9228,39411.1,106,42708.17078,6.75,855.25,1718.0,20087.69366,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019,953.72,…,8.6000e-7,0.0,0.0,0.0,0.0,0,0.000005,0.0,0.0,0,0.000243,0.0000955,0.0,0.0,0.0,0,0.0,0.0000267,0.0000146,0.0000138,0,0.000563,0.0000318,0.0,0.0,0.0,0.000339,0.000113,0.0,0.0,8.6000e-7,0.0000409,0.0000378,0.0000155,0.0000284,0.0000632,0.0000138
"""2019 Apr_sector 10""",12,1181,28686,3387.26,98,282.27,123.0,11736.0,1.74,223.0,24602.0,23783.95049,58513.275,110.32287,262.391368,1648.0,199520.5,13.55,1636,3128.0,21,19119.8044,92.555556,9642.444444,17389.51111,18034.33892,0,0.0,0.0,0.0,0.5,18551.475,55654.425,65416.75,2019,953.72,…,1.5700e-7,0.0,0.0,0.0,0.0,0,4.7200e-7,0.0,0.0,0,0.00001,0.000002,0.0,4.7200e-7,0.0,0,0.0,1.5700e-7,0.0,1.5700e-7,0,0.0000145,4.7200e-7,0.0,1.5700e-7,0.0,0.000009,0.000001,0.0,0.0,0.0,0.0000011,0.000002,0.000001,7.8700e-7,0.000003,7.8700e-7
"""2019 Apr_sector 11""",28,3282,11596,3805.46,117,135.91,621.0,121036.0,23.47,83.5,8007.0,16104.30873,12894.72,95.892216,154.427784,1250.5,149401.5,9.09,730,965.0,6,13219.17808,16.625,1483.125,4619.1,31144.37421,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019,953.72,…,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0000014,0.0,0.0,0.0,0.0,0,0.0,0.0,4.1200e-7,2.2900e-8,0,7.0900e-7,0.0,0.0,2.2900e-8,0.0,4.3500e-7,6.8700e-8,0.0,0.0,0.0,9.1500e-8,9.1500e-8,1.1400e-7,1.1400e-7,9.1500e-8,1.1400e-7
"""2019 Apr_sector 12""",25,2822,20674,5833.06,113,233.32,239.0,29774.0,4.77,186.4,20190.8,23968.08745,48393.486,108.319743,259.621706,1345.0,153243.2,18.794,131,238.0,1,18167.93893,98.142857,9888.571429,19303.11429,19520.62988,0,0.0,0.0,0.0,0.2,9080.98,27242.94,24520.0,2019,953.72,…,0.0,0.0,0.0,0.0,0.0,0,0.0000004,0.0,0.0,0,0.0000124,0.0,0.0,0.0,0.0,0,0.0,0.0000008,0.0,0.0000004,0,0.000022,0.0000008,0.0,0.0,0.0,0.0000148,0.0000012,0.0,0.0,0.0,0.0000004,0.0000048,0.0000004,0.0,0.0000024,0.0000008
"""2019 Apr_sector 13""",87,8601,40474,34810.78,99,400.12,703.0,74686.0,5.76,16.5,2100.166667,54392.65138,11423.36333,127.282828,692.32505,255.333333,37375.16667,34.735,9913,29317.2,118,29574.49813,37.6,3340.7,11428.368,34209.501,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2019,953.72,…,8.0700e-7,0.0,0.0,0.0,0.0,0,0.000002,0.0,0.000001,0,0.000976,0.000188,0.0,0.0,0.0000125,0,0.0000876,0.0000501,0.0,0.0000375,0,0.00428,0.000363,0.0000125,0.0,0.0,0.002378,0.000488,0.0,0.0,0.0000125,0.000225,0.000776,0.0000751,0.00015,0.000313,0.000125
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
"""2024 May_sector 92""",174,18459,15309,28258.65,106,162.41,5583.0,613232.0,38.25,119.333333,11382.5,17208.67267,19587.77167,95.384078,164.143338,3627.0,365448.75,30.6775,11080,18291.1428,100,16508.25162,-1.0,-1.0,-1.0,-1.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2024,-1.0,…,1.7600e-7,0.0,0.0,0.0,0.0,0,2.1100e-7,3.5100e-8,1.0500e-7,0,0.000003,3.1600e-7,0.0,0.0,0.0,0,0.0,1.7600e-7,3.5100e-8,2.8100e-7,0,0.000017,4.9200e-7,0.0,3.5100e-8,0.0,0.0000112,0.000002,7.0200e-8,7.0200e-8,7.0200e-8,5.6200e-7,0.000002,3.8600e-7,7.3800e-7,0.000002,1.4000e-7
"""2024 May_sector 93""",42,5359,29691,15912.65,128,378.87,1313.0,145619.0,23.98,100.0,10868.66667,29769.93191,32355.94667,108.686667,323.559467,2553.333333,277863.0,31.25,3484,6544.0,33,18783.00804,-1.0,-1.0,-1.0,-1.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2024,-1.0,…,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.0000101,0.000004,0.0,0.0,0.0,0,0.0,0.0,0.0,0.000002,0,0.0000334,0.000004,0.0,0.0,0.0,0.000021,0.000004,0.0,0.0,0.0,0.0,0.000005,7.7600e-7,0.000002,0.000008,0.0
"""2024 May_sector 94""",48,5569,32421,18053.83,116,376.12,1745.0,204550.0,22.06,59.8,6138.2,18648.24216,11446.664,102.645485,191.415786,1552.8,158603.2,29.36,6115,10593.6,71,17323.95748,-1.0,-1.0,-1.0,-1.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2024,-1.0,…,2.1500e-7,0.0,0.0,0.0,0.0,0,4.3100e-7,0.0,0.0,0,0.000003,5.3800e-7,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0,9.6900e-7,0.0,0.0,0.0,0.0,5.3800e-7,1.0800e-7,0.0,0.0,0.0,2.1500e-7,1.0800e-7,2.1500e-7,1.0800e-7,3.2300e-7,1.0800e-7
"""2024 May_sector 95""",-1,-1,-1,-1.0,-1,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1,-1.0,-1,-1.0,-1.0,-1.0,-1.0,-1.0,0,0.0,0.0,0.0,-1.0,-1.0,-1.0,-1.0,2024,-1.0,…,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0,0.000283,0.0000202,0.0,0.0000202,0.0,0,0.0,0.0,0.0,0.0000202,0,0.000748,0.0000808,0.0000202,0.0,0.0,0.000384,0.000141,0.0,0.0,0.0,0.0000606,0.0000606,0.0,0.0,0.0000404,0.0000606


In [109]:
# Check for duplicate column names
duplicate_cols = pd.Series(data1.columns).duplicated().tolist()

print(f"\n5️⃣ DUPLICATE COLUMNS")
if not any(duplicate_cols):
    print(f"   ✅ No duplicate column names found")
else:
    print(f"   ⚠️ {sum(duplicate_cols)} duplicate column names detected:")
    print(f"   Duplicates: {pd.Series(data1.columns)[pd.Series(data1.columns).duplicated()].tolist()}")

# Check for duplicate rows
duplicates = data1.is_duplicated().sum()
print(f"\n4️⃣ DUPLICATE ROWS")
if duplicates == 0:
    print(f"   ✅ No duplicate records found")
else:
    print(f"   ⚠️ {duplicates} duplicate rows detected")


5️⃣ DUPLICATE COLUMNS
   ✅ No duplicate column names found

4️⃣ DUPLICATE ROWS
   ✅ No duplicate records found


In [120]:
data1.describe()

statistic,id,num_new_house_transactions,area_new_house_transactions,price_new_house_transactions,amount_new_house_transactions,area_per_unit_new_house_transactions,total_price_per_unit_new_house_transactions,num_new_house_available_for_sale,area_new_house_available_for_sale,period_new_house_sell_through,num_new_house_transactions_nearby_sectors,area_new_house_transactions_nearby_sectors,price_new_house_transactions_nearby_sectors,amount_new_house_transactions_nearby_sectors,area_per_unit_new_house_transactions_nearby_sectors,total_price_per_unit_new_house_transactions_nearby_sectors,num_new_house_available_for_sale_nearby_sectors,area_new_house_available_for_sale_nearby_sectors,period_new_house_sell_through_nearby_sectors,area_pre_owned_house_transactions,amount_pre_owned_house_transactions,num_pre_owned_house_transactions,price_pre_owned_house_transactions,num_pre_owned_house_transactions_nearby_sectors,area_pre_owned_house_transactions_nearby_sectors,amount_pre_owned_house_transactions_nearby_sectors,price_pre_owned_house_transactions_nearby_sectors,num_land_transactions,construction_area,planned_building_area,transaction_amount,num_land_transactions_nearby_sectors,construction_area_nearby_sectors,planned_building_area_nearby_sectors,transaction_amount_nearby_sectors,year,…,shopping_centers_dense,hotel_commercial_dense,third_tier_shopping_malls_in_business_district_dense,second_tier_shopping_malls_in_business_district_dense,city_winner_malls_dense,shopping_malls_with_street_facing_shops_dense,unranked_malls_dense,community_malls_dense,community_winner_malls_dense,key_focus_malls_dense,transportation_facilities_service_bus_station_dense,transportation_facilities_service_subway_station_dense,transportation_facilities_service_airport_related_dense,transportation_facilities_service_port_terminal_dense,transportation_facilities_service_train_station_dense,transportation_facilities_service_light_rail_station_dense,transportation_facilities_service_long_distance_bus_station_dense,leisure_entertainment_entertainment_venue_game_arcade_dense,leisure_entertainment_entertainment_venue_party_house_dense,leisure_entertainment_cultural_venue_cultural_palace_dense,office_building_industrial_building_industrial_building_dense,medical_health_dense,medical_health_specialty_hospital_dense,medical_health_tcm_hospital_dense,medical_health_physical_examination_institution_dense,medical_health_veterinary_station_dense,medical_health_pharmaceutical_healthcare_dense,medical_health_rehabilitation_institution_dense,medical_health_first_aid_center_dense,medical_health_blood_donation_station_dense,medical_health_disease_prevention_institution_dense,medical_health_general_hospital_dense,medical_health_clinic_dense,education_training_school_education_middle_school_dense,education_training_school_education_primary_school_dense,education_training_school_education_kindergarten_dense,education_training_school_education_research_institution_dense
str,str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,…,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
"""count""","""7584""",7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,…,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0,7584.0
"""null_count""","""0""",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,…,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""mean""",,73.309863,7942.349815,37606.217827,27200.010405,107.835311,524.302484,946.842563,109004.567774,18.090389,70.835443,7615.24719,34749.305695,24572.743595,94.301171,407.973847,1021.088323,116916.96743,17.60215,5428.849815,14992.434692,58.273075,23319.614635,63.964775,5969.096141,16412.747806,23990.009526,-0.033755,2790.809931,8525.052108,12242.657404,-0.166446,2919.184127,8754.745907,11706.569548,2021.417722,…,-0.104166,-0.104167,-0.104167,-0.104167,-0.104167,-0.104167,-0.104165,-0.104167,-0.104166,-0.104167,-0.104027,-0.104129,-0.104167,-0.104164,-0.10416,-0.104167,-0.104161,-0.104153,-0.104159,-0.10416,-0.104167,-0.103623,-0.104119,-0.104166,-0.104165,-0.104164,-0.103859,-0.104088,-0.104166,-0.104165,-0.104163,-0.104145,-0.104092,-0.104147,-0.10414,-0.104105,-0.104139
"""std""",,145.51282,15050.631752,29497.652241,45604.912687,70.622713,615.556139,1547.007826,163717.730797,23.706021,92.612832,9545.040118,24407.078378,25796.902389,44.726171,324.206203,1027.491259,109678.794152,12.742778,10018.762158,22669.763617,101.049029,17134.698246,53.246362,5087.8035,13990.462356,15025.074914,0.398361,19026.734032,58441.932968,83909.457785,0.466503,10152.64789,29944.492874,36239.278489,1.51465,…,0.305497,0.305497,0.305497,0.305497,0.305497,0.305497,0.305497,0.305497,0.305497,0.305497,0.305545,0.30551,0.305497,0.305498,0.305499,0.305497,0.305499,0.305501,0.305499,0.305499,0.305497,0.305685,0.305513,0.305497,0.305497,0.305498,0.305603,0.305524,0.305497,0.305497,0.305498,0.305504,0.305522,0.305503,0.305506,0.305518,0.305506
"""min""","""2019 Apr_sector 1""",-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,2019.0,…,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
"""25%""",,3.0,454.0,15793.0,1663.52,92.0,172.22,70.0,10223.0,4.62,15.285714,1897.818182,18339.01491,8005.676667,99.496263,187.079265,243.0,33238.0,10.037143,198.0,440.0,2.0,12531.64557,28.5,2775.4,5541.4656,15278.93229,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2020.0,…,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3e-06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.2e-06,1.77e-07,0.0,0.0,0.0,4e-06,7.52e-07,0.0,0.0,0.0,3.34e-07,7.6e-07,1.02e-07,2.3e-07,1e-06,2.73e-08
"""50%""",,23.0,2958.0,32534.0,11582.38,107.0,363.26,408.0,53569.0,12.03,41.0,4606.0,30671.25442,18749.23857,107.47352,335.245899,721.2,88810.85714,17.056,2595.0,6292.0,29.0,22463.06818,56.833333,5296.833333,14121.85714,22888.39252,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2022.0,…,0.0,0.0,0.0,0.0,0.0,0.0,1.6e-07,0.0,0.0,0.0,1.81e-05,1e-06,0.0,0.0,0.0,0.0,0.0,1.91e-07,3.51e-08,2.37e-07,0.0,4.07e-05,2e-06,0.0,0.0,0.0,2.37e-05,5e-06,0.0,0.0,0.0,2e-06,6e-06,1e-06,1e-06,8e-06,8.68e-07
"""75%""",,80.0,8919.0,53023.0,32404.18,127.0,625.59,1297.0,156686.0,23.58,91.666667,9770.666667,52873.59335,33668.74286,116.631206,602.105138,1505.0,174074.6,24.5675,6970.0,20089.0,77.0,33050.8744,89.4,7973.166667,24351.12857,34822.14595,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2022.0,…,4.08e-07,0.0,0.0,0.0,0.0,0.0,1e-06,0.0,0.0,0.0,6.87e-05,2.02e-05,0.0,0.0,0.0,0.0,0.0,4e-06,9.98e-07,3e-06,0.0,0.000244,2.19e-05,0.0,2.8e-08,0.0,0.000166,3.02e-05,0.0,0.0,7.52e-07,1.09e-05,2.67e-05,6e-06,1e-05,3.28e-05,1e-05
"""max""","""2024 May_sector 96""",2669.0,294430.0,208288.0,606407.64,2003.0,7803.6,12048.0,1220617.0,274.26,990.2,101748.75,107817.8368,315836.8025,286.6625,2231.21875,6158.0,631131.75,100.266667,126073.0,224737.0,1277.0,149937.1502,440.5,43329.0,97268.57143,74599.57237,5.0,465071.07,1715928.0,1876041.0,2.6,155813.4,571976.0,504823.2,2024.0,…,1.15e-05,7.44e-07,2e-06,2e-06,6e-06,0.0,1.92e-05,3e-06,4e-06,0.0,0.001685,0.00068,7.04e-08,0.000232,0.000575,0.0,0.000127,0.000291,0.000209,0.000116,0.0,0.008659,0.000814,4.25e-05,4.61e-05,9.22e-05,0.005056,0.001453,1.16e-05,3.12e-05,9.36e-05,0.000232,0.000872,0.000467,0.000697,0.001162,0.000936
