### Random Forest Algorithm in Machine Learning:
- It is a machine leraning algorithm that uses many decision tress to make better predictions.
- Each tree looks at different random parts of the data and their results are combined by voting for classification (or) averaging for regression which makes it as ensemble learning technique.
- This helps in improving accuracy and reducing errors.

### Working of Random Forest algorithm:
#### 1. Create Many Decision Trees:
- The algorithm makes many decision trees each using a random part of the data. So every tree is a bit of different.
#### 2. Pick Random Features:
- When building each tree it doesn't look at all the features(columns) at once.
- It picks a few at random to decide how to split the data.
- This helps the tress stay different from others.
#### 3. Each Tree Makes a Predictions:
- Every tree gives its own answer (or) prediction based on what it learned from its part of the data.
#### 4. Combine the Predictions:
- For classification we choose a category as the final answer is the one that most trees agree on i.e majority voting.
- For regression we predict a number as the final answer is the average of all the trees predictions.
#### 5. Why works well:
- Using random data and features for each tree helps avoid overfitting and makes the overall prediction more accurate and trustworthy.

### Implementation Random Forest for Classification Tasks:
#### 1. Importing Libraries:

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')


#### 2. Load Dataset:
- Load the titanic dataset.
- Remove rows with missing target values ('Survived')

In [4]:
df = pd.read_csv('titanic.csv')
df = df.dropna(subset=['Survived'])

#### 3. SplitData:
- Select Features like class, sex, age, etc and
- Convert 'Sex' to numbers.

In [6]:
X = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]
y = df['Survived']

X['Sex'] = X['Sex'].map({'female':0, 'male':1})
X['Age'] = X['Age'].fillna(X['Age'].median())
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#### 4. Train and Test Model:
- RandomForestClassifier() - Used to train a Random Forest Model.
- Predict() - Used to test the data.

In [9]:
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)

y_pred = rf_classifier.predict(X_test)

#### 5. Evaluate the Accuracy:

In [13]:
accuracy = accuracy_score(y_test, y_pred)
classi_report = classification_report(y_test, y_pred)
print("Accuracy Score:",accuracy)
print("Classification Report:\n",classi_report)

sample = X_test.iloc[0:1]
prediction = rf_classifier.predict(sample)

sample_dict = sample.iloc[0].to_dict()
print(f'Sample Passenger:\n{sample_dict}')
print(f'Predicted Survival:{'Survived' if prediction[0] == 1 else 'Did Not Survive'}')

Accuracy Score: 0.7988826815642458
Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.85      0.83       105
           1       0.77      0.73      0.75        74

    accuracy                           0.80       179
   macro avg       0.79      0.79      0.79       179
weighted avg       0.80      0.80      0.80       179

Sample Passenger:
{'Pclass': 3.0, 'Sex': 1.0, 'Age': 28.0, 'SibSp': 1.0, 'Parch': 1.0, 'Fare': 15.2458}
Predicted Survival:Did Not Survive


### Implementing Random Forest for Regression Tasks:
- We will do house price prediction here -
- Load the California housing dataset and create a DataFrame with features and target.
- Separate the features and the target variable.
- Split the data into training and testing sets (80% train, 20% test).
- Initialize and train a Random Forest Regressor using the training data.
- Predict house values on test data and evaluate using MSE and RÂ² score.
- Print a sample prediction and compare it with the actual value.

In [17]:
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

cal_house = fetch_california_housing()
Data = cal_house.data
FeatureNames = cal_house.feature_names
df = pd.DataFrame(Data, columns=FeatureNames)
df['MEDV'] = cal_house.target

X = df.drop('MEDV', axis=1)
y = df['MEDV']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

rf_regress = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regress.fit(X_train, y_train)

y_pred = rf_regress.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

single_data = X_test.iloc[0].values.reshape(1, -1)
predicted_value = rf_regress.predict(single_data)
print(f'Predicted Value:{predicted_value[0]:.2f}')
print(f'Actual Value:{y_test.iloc[0]:.2f}')

print(f'Mean Squared Error:{mse:.2f}')
print(f'R-squared Score:{r2:.2f}')

Predicted Value:0.51
Actual Value:0.48
Mean Squared Error:0.26
R-squared Score:0.81
