# Mercedes-Benz Manufacturing Time Prediction

This project uses LightGBM regression to predict Mercedes-Benz manufacturing times based on vehicle features. The model processes production data, trains on historical records, and provides real-time predictions through a simple web interface, aiming to optimize manufacturing efficiency.

Dataset: https://www.kaggle.com/competitions/mercedes-benz-greener-manufacturing/data

Hugging Face: https://huggingface.co/spaces/alperugurcan/time-prediction

In [2]:
import pandas as pd
from lightgbm import LGBMRegressor
import joblib

def process_data():
    train = pd.read_csv('/kaggle/input/mercedes-benz-greener-manufacturing/train.csv.zip')
    test = pd.read_csv('/kaggle/input/mercedes-benz-greener-manufacturing/test.csv.zip')
    
    data = pd.concat([train, test]).set_index('ID')
    data = pd.get_dummies(data)
    
    train_processed = data[:len(train)]
    test_processed = data[len(train):]
    
    return train_processed, test_processed

def main():
    train, test = process_data()
    
    model = LGBMRegressor(random_state=42)
    model.fit(train.drop('y', axis=1), train.y)
    
    joblib.dump(model, 'mercedes_model.joblib')
    
    feature_names = train.drop('y', axis=1).columns.tolist()
    joblib.dump(feature_names, 'feature_names.joblib')
    
    predictions = model.predict(test.drop('y', axis=1))
    pd.DataFrame({'ID': test.index, 'y': predictions}).to_csv('submission.csv', index=False)

if __name__ == "__main__":
    main()

[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007684 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 794
[LightGBM] [Info] Number of data points in the train set: 4209, number of used features: 397
[LightGBM] [Info] Start training from score 100.669318
