# 🧪Lab: Predicting Seismic Building Response with Support Vector Regression

In this lab, you will practice **Support Vector Machine (SVM)** in the context of **regression**. Over the past two weeks, we have focused on SVM for **classification problems**. However, SVM can also be extended to handle **regression tasks**—this is known as **Support Vector Regression (SVR)**.

To explore this, you will work with the following paper:  https://www.nature.com/articles/s41598-024-81705-3

In this paper, the authors use **real seismic monitoring data** to develop an SVR-based model for predicting the **maximum inter-story drift ratio** of buildings during earthquakes. The authors compare a full-feature model with a reduced-feature version, and demonstrate that SVR is effective in this complex, real-world regression setting.

You will begin by reading the **introduction** of the paper to understand the motivation behind the problem. Later, you will apply Support Vector Regression to their dataset.

Reference:

- Tao, D., Fang, S., Liu, H. et al. Support vector regression model for the prediction of buildings’ maximum seismic response based on real monitoring data. Sci Rep 14, 29874 (2024). https://doi.org/10.1038/s41598-024-81705-3

--- 

**Software**: Use `scikit-learn` and its implementation of Support Vector Regression in this lab. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html

---

**Collaboration Note**: This assignment is designed to support collaborative work. We encourage you to divide tasks among group members so that everyone can contribute meaningfully. Many components of the assignment can be approached in parallel or split logically across team members. Good coordination and thoughtful integration of your work will lead to a stronger final result.

--- 

In total, this lab assignment will be worth **100 points**.

## 1. Paper reflection (10 points)

a. After reading the introduction of the paper, discuss within the group and address the following questions:
   
   - Why is predicting seismic building response important? (You may consider implications for human safety, economic losses, emergency planning, etc)

- Why might the NDE1.0 dataset be particularly well-suited for a machine learning approach? (You may think about the volume and diversity of data, the types of features provided, and the limitations of traditional physics-based methods).

Please, elaborate on your answers.

YOUR TEXT HERE

b. After reading the paper, explain how Support Vector Regression (SVR) works. In addition, explain Formulas (1) to (11) from the paper, i.e. what is the intuition behind each equation?

YOUR TEXT HERE

## 2. Exploratory Analysis (15 Points)

a. Download the NDE1.0 dataset from [this link](https://www.isterre.fr/annuaire/pages-web-du-personnel/philippe-gueguen/new-earthquake-data-recorded-in-buildings-nde1-0/article/acces-to-the-flatfile.html).

Load the data and display the first few rows.

In [None]:
# [Your code here]

b. Explore the distribution of the target variable (Drift), and create scatterplots between Drift and input features.


In [None]:
# [Your code here]

c. Highlight any particular pattern you observe from the visualizations. Which input features seem potentially more influential in predicting Drift?

YOUR TEXT HERE

## 3. Fit and test (30 points)

a. Fit and test a SVR model using all available features. This process should include an automatic tuning of both the kernel and the penalty term (C). Evaluate the model using at least Coefficient of determination (R²) and Mean Squared Error (MSE).

In [None]:
# [Your code here]

b. Repeat the previous process using an Ridge linear regression model. Be sure to include automatic tuning of the model’s hyperparameters, specifically, `alpha`.

In [None]:
# [Your code here]

c. Use predicted vs. true value scatterplots to compare the SVR and Ridge models. Embed performance metrics (e.g., R², MSE) in each plot.

Which model seems to perform better at this task?

In [None]:
# [Your code here]

## 4. Reduced SVR Model (30 points)

a. Include a feature selection step in the SVR pipeline that selects the 10 features most correlated with Drift. Retrain and evaluate the model after this step.

In [None]:
# [Your code here]

Subsequently, answer the following questions:

- How much does the performance change?
- What are the practical advantages of this smaller model?

Elaborate on your answers.

b. Compare the performance of this reduced-feature pipeline with a Lasso regression model using all features.
- Do both models select the same features?
- How do their performances compare?

In [None]:
# [Your code here]

## 5: Robustness to Point Removal (10 points)

a. Compare the sensitivity of SVR and Ridge Regression to small changes in the training set. To achieve this, randomly remove 10 training points, retrain both models, and observe how much the predictions change.
- Which model appears more stable?
- Why?



In [None]:
# [Your code here]

## 6. Collaboration Reflection (5 points)

As a group, briefly reflect on the following (max 1–2 short paragraphs):

- How did the group dynamics work throughout the assignment?
- Were there any major disagreements or diverging approaches?
- How did you resolve conflicts or make final modeling decisions?
- What did you learn from each other during this project?

YOUR TEXT HERE