```python
# Shape of training data (num_rows, num_columns)
print(X_train.shape)

# Number of missing values in each column of training data
missing_val_count_by_column = (X_train.isnull().sum())
print(missing_val_count_by_column[missing_val_count_by_column > 0])
```
output:
```
(1168, 36)
LotFrontage    212
MasVnrArea       6
GarageYrBlt     58
dtype: int64
```
that means there is 1168 rows and 36 columns. there ise 212 + 6 + 58 = 276 missing values in the training data.


In the context of the drop() method in pandas, the axis parameter is used to specify whether the operation should be performed along the rows (axis=0) or along the columns (axis=1).


```python
# Get names of columns with missing values
cols_with_missing = [col for col in X_train.columns
                     if X_train[col].isnull().any()]

# Drop columns in training and validation data
reduced_X_train = X_train.drop(cols_with_missing, axis=1)
reduced_X_valid = X_valid.drop(cols_with_missing, axis=1)
```
better way for me is:
```python
reduced_X_train = X_train.dropna(axis=1)
reduced_X_valid = X_valid.dropna(axis=1)
```

### Imputation
```python
# Create an instance of SimpleImputer with the desired strategy
# The default strategy is 'mean'
imputer = SimpleImputer(strategy='mean')

# Fit the imputer to the data (find the mean for each column)
imputer.fit(X)

# Transform the data by replacing missing values with the computed means
X_imputed = imputer.transform(X)
```
`DONT FORGET TO FIT THE IMPUTER TO DataFrame`
We can also directly make a fit and transform in one step:
```python
my_imputer = SimpleImputer()
imputed_X_train = pd.DataFrame(my_imputer.fit_transform(X_train))
imputed_X_valid = pd.DataFrame(my_imputer.transform(X_valid))

# Now define a model

model_forest = RandomForestRegressor(n_estimators=100, random_state=0)
model_forest.fit(imputed_X_train, y_train)
preds = model_forest.predict(imputed_X_valid)

# Evaluate the model
print(mean_absolute_error(y_valid, preds))
```