# Introduction
In this project, we aim to classify close matches in cricket using a dataset of historical matches. A close match is defined as a match where the winning margin is minimal, either by runs or wickets. We use a Decision Tree Classifier to predict whether a match will be close based on several features such as the application of the Duckworth-Lewis (DL) method, the month in which the match was played, and the toss decision.

## Data Loading and Initial Exploration
First, we load the dataset and perform some initial calculations to understand the distribution of certain features.

In [None]:
import pandas as pd

# Load the dataset
matches = pd.read_excel('matches.xlsx')

# Calculate the proportion of matches where DL method was applied
dl_applied_matches = matches['dl_applied'].sum()
total_matches = matches.shape[0]
proportion_dl_applied = dl_applied_matches / total_matches

print(f"Proportion of matches where DL method was applied: {proportion_dl_applied:.4f}")

# Calculate the proportion of matches won by the team who batted first
matches_batting_first_won = matches[matches['win_by_runs'] > 0].shape[0]
proportion_batting_first_won = matches_batting_first_won / total_matches

print(f"Proportion of matches won by the team who batted first: {proportion_batting_first_won:.4f}")

# Create a new column to check if the match was played in April
matches['played_in_april'] = pd.to_datetime(matches['date']).dt.month == 4

# Create a new column to check if the toss winner chose to field
matches['toss_field'] = matches['toss_decision'] == 'field'

# Count the number of matches played in April
april_games_count = matches['played_in_april'].sum()

# Count the number of matches where the toss winner chose to field
toss_field_count = matches['toss_field'].sum()

print(f"Number of April games: {april_games_count}")
print(f"Number of choices to field first: {toss_field_count}")


## Creating Features and Target Variable
We create a target variable close_match to indicate whether a match was close. We also define the features played_in_april, toss_field, and dl_applied.


In [None]:
# Create the 'close_match' target variable
matches['close_match'] = ((matches['win_by_runs'] <= 20) & (matches['win_by_runs'] > 0)) | ((matches['win_by_wickets'] <= 4) & (matches['win_by_wickets'] > 0))

# Define the features and target variable
features = matches[['played_in_april', 'toss_field', 'dl_applied']]
target = matches['close_match']

# Convert boolean columns to integers
features = features.astype(int)
target = target.astype(int)


## Model Training and Evaluation
We split the data into training and test sets and train a Decision Tree Classifier. We then evaluate the model's performance using accuracy and a confusion matrix.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Split the data into training and test sets (75:25 split)
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.25, random_state=999)

# Create and train the decision tree classifier
clf = DecisionTreeClassifier(random_state=999)
clf.fit(X_train, y_train)

# Predict on the test set
y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the decision tree classifier: {accuracy:.4f}")

# Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred)

print("Confusion Matrix:")
print(cm)


## Conclusion
In this project, we developed a model to classify close cricket matches using features related to the timing of the match, the toss decision, and the application of the DL method. The model achieved a reasonable accuracy, but there is room for improvement. Future work could include exploring more sophisticated models, performing feature engineering, and tuning hyperparameters to enhance predictive performance.

By identifying the factors that contribute to close matches, teams and analysts can gain better insights into match dynamics and strategize accordingly.