## Title: Sberbank Russian Housing Market

### Purpose:
The purpose of this Python script is to conduct a comprehensive analysis of a dataset, including data cleaning, preprocessing, exploratory data analysis (EDA), data visualization, and hypothesis testing.

### Libraries Used:
- pandas (pd): For data manipulation and analysis.
- numpy (np): For numerical computing.
- seaborn (sns): For data visualization.
- matplotlib.pyplot (plt): For plotting graphs.
- scipy.stats.ttest_ind: For conducting a t-test.
- sklearn.model_selection.train_test_split: For splitting data into training and testing sets.
- sklearn.linear_model.LinearRegression: For linear regression modeling.
- sklearn.metrics.mean_squared_error: For evaluating model performance.

### Steps:

1. **Load the Dataset:**
    - Uses `pd.read_csv()` to load the dataset from a CSV file named 'train_without_noise.csv'.

2. **Data Cleaning and Preprocessing (Section B):**
    - Handles missing data by dropping rows containing null values using `data.dropna(inplace=True)`.
    - Detects and addresses outliers/anomalies in the 'full_sq' column using a box plot generated by `sns.boxplot(x=data['full_sq'])`.

3. **Exploratory Data Analysis (EDA) (Section C):**
    - Conducts statistical analysis of the dataset using `data.describe()`.

4. **Data Visualization (Section D):**
    - Plots a histogram of the 'full_sq' column using `sns.histplot(data['full_sq'], bins=20)`.
    - Creates a box plot of 'price_doc' vs 'num_room' using `sns.boxplot(y='price_doc', x='num_room', data=data)`.
    - Generates a scatter plot of 'life_sq' vs 'price_doc' using `sns.scatterplot(x='life_sq', y='price_doc', data=data)`.

5. **Hypothesis Testing (Section E):**
    - Defines Null Hypothesis and Alternative Hypothesis related to the difference in 'price_doc' between houses with different 'material'.
    - Conducts a t-test between 'price_doc' values for houses with 'material' values of 1 and 2 using `ttest_ind(material_1, material_2)`. Computes t-statistic and p-value.

### Conclusion:
This script provides a structured approach to analyzing the dataset, cleaning/preprocessing data, performing EDA, visualizing data, and conducting hypothesis testing. It facilitates understanding the relationships and patterns within the data.
