# Economic forecast based on machinery data

<i> Shun Ye CHEN, Wenbo DUAN, Louis NEL, Baoyue ZHANG </i>

## Introduction

Infrastructure development is a key driver of economic growth, and the demand for construction machinery provides valuable insights into regional economic activity. By analyzing loan data for construction equipment, we can uncover patterns that indicate economic trends across different cities in China.

Using working hours data for construction machinery across various Chinese cities, you must predict regional economic growth. 

This challenge is the usual work of economists. Understanding economic trends at a city level is crucial for policymakers, businesses, and investors. A reliable predictive model could help financial institutions optimize lending strategies, aid governments in infrastructure planning, and provide businesses with insights into emerging market opportunities.

The provided data is comes from industry partners and the China Bureau of Statistics over the timespan from January 2020 to August 2023, offering a unique opportunity to work with financial and economic indicators tied to infrastructure investments.

# Exploratory data analysis

The goal of this section is to show what's in the data, and how to play with it.
This is the first set in any data science project, and here, you should give a sense of the data the participants will be working with.

You can first load and describe the data, and then show some interesting properties of it.

In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', None)

# Load the data

import problem
X_df, y = problem.get_train_data()

FileNotFoundError: [Errno 2] No such file or directory: 'data\\X_train.csv'

# Challenge evaluation

A particularly important point in a challenge is to describe how it is evaluated. This is the section where you should describe the metric that will be used to evaluate the participants' submissions, as well as your evaluation strategy, in particular if there is some complexity in the way the data should be split to ensure valid results.

# Submission format

Here, you should describe the submission format. This is the format the participants should follow to submit their predictions on the RAMP plateform.

This section also show how to use the `ramp-workflow` library to test the submission locally.

## The pipeline workflow

The input data are stored in a dataframe. To go from a dataframe to a numpy array we will use a scikit-learn column transformer. The first example we will write will just consist in selecting a subset of columns we want to work with.

In [None]:
# %load submissions/starting_kit/estimator.py

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression


def get_estimator():
    pipe = make_pipeline(
        StandardScaler(),
        LogisticRegression()
    )

    return pipe


## Testing using a scikit-learn pipeline

In [None]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(get_estimator(), X_df, y, cv=5, scoring='accuracy')
print(scores)

NameError: name 'get_estimator' is not defined

## Submission

To submit your code, you can refer to the [online documentation](https://paris-saclay-cds.github.io/ramp-docs/ramp-workflow/stable/using_kits.html).