## Data Loading and Handling

The provided code snippet demonstrates how to load and handle data using the Pandas library in Python. It begins by importing the Pandas library and then reads a CSV file named 'datak.csv' into a DataFrame called 'df'. The DataFrame is initially loaded without considering any specific values as missing, but the second line of code shows how to specify specific values ('na', '-', 'NaN') as NaN (Not a Number) values during the loading process, allowing for better data preprocessing and handling.

In [1]:
import pandas as pd

df = pd.read_csv('datak.csv')
df = pd.read_csv('datak.csv',na_values = ["na","-","NaN"])

# Load data

In [2]:
df.head()

Unnamed: 0,Nama,pac,sho,pas,dri,def,phy,lbl
0,Kevin De Bruyne,76,86,93,88,64,78,0
1,Ederson,87,82,93,88,64,88,0
2,Raheem Sterling,91,82,79,87,45,66,1
3,Ruben Dias,61,38,65,68,88,88,0
4,Joao Cancelo,85,71,83,84,80,72,0


## FIFA 2021 Player Performance Dataset

The provided dataset contains performance statistics of football (soccer) players featured in the FIFA 2021 video game. Each row represents a player, and the columns provide information about the player's abilities in various aspects of the game.

![Card](./mbappe.jpeg)

### Columns Explanation:
- **Name**: Name of the player.
- **pac**: Pace statistic of the player.
- **sho**: Shooting skill statistic of the player.
- **pas**: Passing skill statistic of the player.
- **dri**: Dribbling skill statistic of the player.
- **def**: Defending skill statistic of the player.
- **phy**: Physical attribute statistic of the player.
- **lbl**: Label indicating whether the player is a forward (1) or not a forward (0) in the game.

This dataset offers insights into different attributes and skills of each player, such as speed, shooting accuracy, passing ability, dribbling proficiency, defensive capabilities, physical attributes, and their designated role or position as either a forward or not a forward in the FIFA 2021 game.


### Feature Extraction and Normalization

In this section, we extract the features from the dataset and normalize them for further analysis.

```python
# List of features to be used in analysis
features = ['pac', 'sho', 'pas', 'dri', 'def', 'phy']

# Retrieve feature values from data as a numpy array
data_features = df.loc[:, features].values

# Display the dimension of feature data
n = len(data_features)
print("Number of data points:", n)


In [3]:
# List of features to be used in analysis
features = ['pac', 'sho', 'pas', 'dri', 'def', 'phy']

# Retrieve feature values from data as a numpy array
data_features = df.loc[:, features].values

# Display the dimension of feature data
n = len(data_features)
print("Number of data points:", n)

# Normalize feature data by dividing each value by 100
data_normalized = data_features[:, :] / 100

# Display normalized feature data
print("Normalized Feature Data:")
print(data_normalized)

Number of data points: 28
Normalized Feature Data:
[[0.76 0.86 0.93 0.88 0.64 0.78]
 [0.87 0.82 0.93 0.88 0.64 0.88]
 [0.91 0.82 0.79 0.87 0.45 0.66]
 [0.61 0.38 0.65 0.68 0.88 0.88]
 [0.85 0.71 0.83 0.84 0.8  0.72]
 [0.65 0.8  0.85 0.86 0.73 0.72]
 [0.92 0.63 0.76 0.78 0.8  0.82]
 [0.8  0.76 0.83 0.88 0.46 0.64]
 [0.84 0.78 0.8  0.87 0.56 0.57]
 [0.59 0.72 0.75 0.78 0.84 0.75]
 [0.81 0.79 0.81 0.9  0.38 0.6 ]
 [0.89 0.91 0.65 0.8  0.45 0.88]
 [0.71 0.89 0.75 0.87 0.33 0.69]
 [0.84 0.86 0.86 0.9  0.4  0.6 ]
 [0.83 0.81 0.86 0.87 0.48 0.69]
 [0.86 0.87 0.67 0.81 0.39 0.77]
 [0.87 0.78 0.85 0.9  0.36 0.45]
 [0.7  0.73 0.75 0.85 0.65 0.66]
 [0.63 0.51 0.73 0.72 0.81 0.78]
 [0.65 0.31 0.65 0.68 0.83 0.72]
 [0.81 0.74 0.77 0.82 0.67 0.74]
 [0.58 0.74 0.78 0.77 0.75 0.77]
 [0.56 0.7  0.81 0.77 0.77 0.75]
 [0.76 0.79 0.75 0.8  0.58 0.68]
 [0.78 0.77 0.85 0.81 0.34 0.76]
 [0.5  0.64 0.79 0.69 0.69 0.8 ]
 [0.36 0.67 0.75 0.73 0.8  0.73]
 [0.73 0.42 0.58 0.67 0.79 0.77]]


Certainly! Here's the provided code converted to markdown format in a Jupyter Notebook style with brief descriptions:

```markdown
### Target Data and Dimensionality

In this section, we explore the dimensionality of the feature data and retrieve the target values for analysis.

```python
# Number of dimensions in feature data
n_dimensions = len(data_normalized[0])
print("Number of Feature Dimensions:", n_dimensions)
```

Here, we calculate the number of dimensions present in the normalized feature data. This will help us understand the complexity of the feature space.

```python
# Target column name
target_column = 'lbl'

# Retrieve target values from data as a numpy array
data_target = df.loc[:, target_column].values

# Display target values
print("Target Data:")
print(data_target)
```

We identify the column name that represents the target variable in our dataset. Then, we retrieve the target values from the dataset as a numpy array and display them. The target values typically represent the labels or outcomes that we want to predict or analyze.

This concludes the step where we explore the dimensionality of the feature data and obtain the target values for analysis.
```

Feel free to include this markdown content in your Jupyter Notebook to provide explanations for the code snippet you've shared.

In [4]:
# Number of dimensions in feature data
n_dimensions = len(data_normalized[0])
print("Number of Feature Dimensions:", n_dimensions)

# Target column name
target_column = 'lbl'

# Retrieve target values from data as a numpy array
data_target = df.loc[:, target_column].values

# Display target values
print("Target Data:")
print(data_target)

Number of Feature Dimensions: 6
Target Data:
[0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0]


# Model Training: Perceptron Learning Algorithm

In this section, we perform the training of a perceptron model using the provided data. The perceptron learning algorithm adjusts the weights to minimize prediction errors.

```python
# Learning rate value
learning_rate = 0.9

# Initialize initial weights
weights = [1, 1, 1, 1, 1, 1]

# Threshold value
threshold = 0

# List to store errors
error_list = []

# Model training
for i in range(n):
    V = 0
    for k in range(n_dimensions):
        V += data_normalized[i][k] * weights[k]
    if V < threshold:
        prediction = 0
    else:
        prediction = 1
    error = data_target[i] - prediction
    error_list.append(error)
    for k in range(n_dimensions):
        weights[k] += learning_rate * error * data_normalized[i][k]
```

Here, we set the learning rate, initialize the initial weights, and define the threshold value. We then proceed to perform the model training using the perceptron learning algorithm.

In each iteration, the algorithm computes the weighted sum `V` of the normalized feature values multiplied by the corresponding weights. If the computed `V` is less than the threshold, the prediction is set to 0; otherwise, it is set to 1. The error is calculated as the difference between the target value and the prediction.

The algorithm updates the weights for each feature dimension based on the learning rate and the error term.

A list `error_list` is used to store the errors for analysis and convergence monitoring.

This completes the training phase of the perceptron model using the provided data.
```

Feel free to include this markdown content in your Jupyter Notebook to provide explanations for the code snippet you've shared.

In [5]:
# Learning rate value
learning_rate = 0.9

# Initialize initial weights
weights = [1, 1, 1, 1, 1, 1]

# Threshold value
threshold = 0

# List to store errors
error_list = []

# Model training
for i in range(n):
    V = 0
    for k in range(n_dimensions):
        V += data_normalized[i][k] * weights[k]
    if V < threshold:
        prediction = 0
    else:
        prediction = 1
    error = data_target[i] - prediction
    error_list.append(error)
    for k in range(n_dimensions):
        weights[k] += learning_rate * error * data_normalized[i][k]

### Iterative Model Refinement: Epochs and Convergence

In this section, we refine the trained perceptron model iteratively using the concept of epochs. Each epoch consists of updating the model weights and checking for convergence.

```python
# Iterative model refinement using epochs
previous_errors = []
loop = 0
while error_list != previous_errors:
    previous_errors = error_list.copy()
    error_list = []
    for i in range(n):
        V = 0
        for k in range(n_dimensions):
            V += data_normalized[i][k] * weights[k]
        if V < threshold:
            prediction = 0
        else:
            prediction = 1
        error = data_target[i] - prediction
        error_list.append(error)
        for k in range(n_dimensions):
            weights[k] += learning_rate * error * data_normalized[i][k]
    loop += 1
    print("Epoch:", loop)
```

Here, we perform iterative model refinement using the concept of epochs. An epoch represents one complete iteration through the entire dataset.

In each epoch, we update the model weights based on the prediction errors for each data point. We calculate the weighted sum `V` of the normalized feature values multiplied by the corresponding weights. If `V` is less than the threshold, the prediction is set to 0; otherwise, it is set to 1. The error is then computed as the difference between the target value and the prediction.

The algorithm updates the weights for each feature dimension based on the learning rate and the error term.

The loop continues until the error list remains the same in consecutive epochs, indicating convergence.

You can customize the number of epochs based on the desired convergence criteria. The loop will continue until the error list no longer changes significantly, reflecting the training process's convergence.

This concludes the iterative refinement of the perceptron model using the concept of epochs and convergence.
```

Feel free to include this markdown content in your Jupyter Notebook to provide explanations for the code snippet you've shared, including the concept of epochs and convergence.

In [6]:
# Iterative model refinement
previous_errors = []
loop = 0
while error_list != previous_errors:
    previous_errors = error_list.copy()
    error_list = []
    for i in range(n):
        V = 0
        for k in range(n_dimensions):
            V += data_normalized[i][k] * weights[k]
        if V < threshold:
            prediction = 0
        else:
            prediction = 1
        error = data_target[i] - prediction
        error_list.append(error)
        for k in range(n_dimensions):
            weights[k] += learning_rate * error * data_normalized[i][k]
    loop += 1
    print("Loop:", loop)


Loop: 1
Loop: 2
Loop: 3


In [None]:
# Import data for testing
dff = pd.read_csv('datak1.csv', na_values=["na", "-", "NaN"])

# Retrieve feature values from testing data as a numpy array
dff_features = dff.loc[:, features].values

# Normalize testing feature data by dividing each value by 100
dff_normalized = dff_features[:, :] / 100

# Retrieve target values from testing data as a numpy array
dff_target = dff.loc[:, target_column].values

# List to store predictions
predictions = []

# Model testing
for i in range(len(dff)):
    V = 0
    for k in range(n_dimensions):
        V += dff_normalized[i][k] * weights[k]
    if V < threshold:
        prediction = 0
    else:
        prediction = 1
    predictions.append(prediction)

# Calculate accuracy
correct_predictions = sum(1 for i in range(len(dff)) if predictions[i] == dff_target[i])
accuracy = (correct_predictions / len(dff)) * 100
print('Accuracy of the data is:', accuracy, 'percent')