
 # Load the datasets:

Load the training datasets from the provided CSV files.
Load the ideal functions dataset from the provided CSV file.
Load the test dataset from the provided CSV file.
Create a SQLite database:

Store the training data in one table.
Store the ideal functions in another table.
Select the best ideal functions:

For each training dataset, find the ideal function that minimizes the sum of squared deviations (Least Squares).
Map the test data to the selected ideal functions:

For each x-y pair in the test data, determine if it can be assigned to one of the four chosen ideal functions. The criterion is that the deviation should not exceed the largest deviation in the training data by more than a factor of sqrt(2).
Visualize the data:

Use Bokeh to visualize the training data, test data, chosen ideal functions, and deviations.
Implement unit tests:

Create unit tests to ensure the correctness of your functions.
Use Git for version control:

Provide the necessary Git commands for cloning the repository, committing, pushing, and creating a pull request.
 ---------------------------------------------------
 
 Explanation
 # Data Loading and Database Creation:

The load_csv function reads CSV files into pandas DataFrames.
The create_database function creates a SQLite database and populates it with the training data, ideal functions, and test data.
 # Finding Best Ideal Functions:

The find_best_ideal_functions function computes the Mean Squared Error (MSE) for each pair of training and ideal functions, selecting the one with the lowest MSE.
 # Mapping Test Data:

The map_test_data function maps each test data point to the closest ideal function based on the deviation criteria.
 # Visualization:

The visualize_data function uses Bokeh to create an interactive plot showing the training data, ideal functions, test data, and mapped data.
 # Unit Tests:

The TestFunctions class defines unit tests to validate the correctness of the best function selection and data mapping.
 # Git Commands:

The Git commands outline the process of cloning the repository, creating a new branch, committing changes, pushing them, and creating a pull request for review.
By following this solution, you should be able to complete


## The python assignment with a given dataset have done with Jupyter notebook 

In [None]:
 # step1 : firstly we should import all of the libraries which we need for this project

In [1]:

# Create a pull request on GitHub to merge your changes into the develop branch
import pandas as pd
import numpy as np
import sqlite3
from sqlalchemy import create_engine
import bokeh.plotting as bk
from bokeh.io import output_file, show
from bokeh.models import Legend
from sklearn.metrics import mean_squared_error
import math
import unittest


In [None]:
 # step2 : in the second step we create a function for importing .csv datasets and after that load all of our datasets

In [2]:
# Function to load CSV data
def load_csv(file_path):
    return pd.read_csv(file_path)

training_data = load_csv('Dataset2/train.csv')
ideal_functions = load_csv('Dataset2/ideal.csv')
test_data = load_csv('Dataset2/test.csv')



In [3]:
print(f"Training dataset:",'\n---------------------------------------------------------')

training_data.head()


Training dataset: 
---------------------------------------------------------


Unnamed: 0,x,y1,y2,y3,y4
0,-20.0,39.778572,-40.07859,-20.214268,-0.324914
1,-19.9,39.604813,-39.784,-20.07095,-0.05882
2,-19.8,40.09907,-40.018845,-19.906782,-0.45183
3,-19.7,40.1511,-39.518402,-19.389118,-0.612044
4,-19.6,39.795662,-39.360065,-19.81589,-0.306076


In [4]:
print(f"Ideal dataset:",'\n---------------------------------------------------------')

ideal_functions.head()

Ideal dataset: 
---------------------------------------------------------


Unnamed: 0,x,y1,y2,y3,y4,y5,y6,y7,y8,y9,...,y41,y42,y43,y44,y45,y46,y47,y48,y49,y50
0,-20.0,-0.912945,0.408082,9.087055,5.408082,-9.087055,0.912945,-0.839071,-0.850919,0.816164,...,-40.456474,40.20404,2.995732,-0.008333,12.995732,5.298317,-5.298317,-0.186278,0.912945,0.39685
1,-19.9,-0.867644,0.497186,9.132356,5.497186,-9.132356,0.867644,-0.865213,0.168518,0.994372,...,-40.23382,40.04859,2.99072,-0.00834,12.99072,5.293305,-5.293305,-0.21569,0.867644,0.476954
2,-19.8,-0.813674,0.581322,9.186326,5.581322,-9.186326,0.813674,-0.889191,0.612391,1.162644,...,-40.006836,39.89066,2.985682,-0.008347,12.985682,5.288267,-5.288267,-0.236503,0.813674,0.549129
3,-19.7,-0.751573,0.659649,9.248426,5.659649,-9.248426,0.751573,-0.910947,-0.994669,1.319299,...,-39.775787,39.729824,2.980619,-0.008354,12.980619,5.283204,-5.283204,-0.247887,0.751573,0.61284
4,-19.6,-0.681964,0.731386,9.318036,5.731386,-9.318036,0.681964,-0.930426,0.774356,1.462772,...,-39.54098,39.565693,2.97553,-0.008361,12.97553,5.278115,-5.278115,-0.249389,0.681964,0.667902


In [5]:
print(f"Test dataset:",'\n---------------------------------------------------------')

test_data.head()

Test dataset: 
---------------------------------------------------------


Unnamed: 0,x,y
0,17.5,34.16104
1,0.3,1.215102
2,-8.7,-16.843908
3,-19.2,-37.17087
4,-11.0,-20.263054


In [None]:
 # step 3: in this step create a function to create a sqlite database and import all of our datasets in a database file in 
 # three different tables

In [6]:
# Function to create and populate the SQLite database
def create_database(training_data, ideal_functions, test_data, db_name="data.db"):
    engine = create_engine(f'sqlite:///{db_name}')
    conn = engine.connect()

    # Create and populate training data table
    training_data.to_sql('training_data', conn, if_exists='replace', index=False)
    
    # Create and populate ideal functions table
    ideal_functions.to_sql('ideal_functions', conn, if_exists='replace', index=False)
    
    # Create and populate test data table
    test_data.to_sql('test_data', conn, if_exists='replace', index=False)
    
    return conn

In [None]:
  # step 4: in this step create a function for viewing the content from the table of SQlite database

In [7]:
# Function to view table contents
def view_table(conn, table_name):
    query = f"SELECT * FROM {table_name}"
    df = pd.read_sql(query, conn)
    return df

In [8]:
conn = create_database(training_data, ideal_functions, test_data)

# Viewing the contents of the training_data table
training_data_df = view_table(conn, 'training_data')
print("Training Data:")
print(training_data_df)

# Viewing the contents of the ideal_functions table
ideal_functions_df = view_table(conn, 'ideal_functions')
print("\nIdeal Functions:")
print(ideal_functions_df)

# Viewing the contents of the test_data table
test_data_df = view_table(conn, 'test_data')
print("\nTest Data:")
print(test_data_df)

Training Data:
        x         y1         y2         y3        y4
0   -20.0  39.778572 -40.078590 -20.214268 -0.324914
1   -19.9  39.604813 -39.784000 -20.070950 -0.058820
2   -19.8  40.099070 -40.018845 -19.906782 -0.451830
3   -19.7  40.151100 -39.518402 -19.389118 -0.612044
4   -19.6  39.795662 -39.360065 -19.815890 -0.306076
..    ...        ...        ...        ...       ...
395  19.5 -38.254158  39.661987  19.536741  0.695158
396  19.6 -39.106945  39.067880  19.840752  0.638423
397  19.7 -38.926495  40.211475  19.516634  0.109105
398  19.8 -39.276672  40.038870  19.377943  0.189025
399  19.9 -39.724934  40.558865  19.630678  0.513824

[400 rows x 5 columns]

Ideal Functions:
        x        y1        y2         y3        y4         y5        y6  \
0   -20.0 -0.912945  0.408082   9.087055  5.408082  -9.087055  0.912945   
1   -19.9 -0.867644  0.497186   9.132356  5.497186  -9.132356  0.867644   
2   -19.8 -0.813674  0.581322   9.186326  5.581322  -9.186326  0.813674   
3   -19

In [None]:
 # step 5: in this step create a function for finding the best data in the ideal function dataset

In [9]:

# Function to find the best ideal functions for the training data
def find_best_ideal_functions(training_data, ideal_functions):
    best_functions = {}
    for col in training_data.columns[1:]:
        min_mse = float('inf')
        best_func = None
        for ideal_col in ideal_functions.columns[1:]:
            mse = mean_squared_error(training_data[col], ideal_functions[ideal_col])
            if mse < min_mse:
                min_mse = mse
                best_func = ideal_col
        best_functions[col] = best_func
    return best_functions

In [55]:
print(f"The best four ideal functions:", best_functions)

The best four ideal functions: {'y1': 'y42', 'y2': 'y41', 'y3': 'y11', 'y4': 'y48'}


In [None]:
 # step 6: in this step we should map the data on the test dataset with these four best ideal functions

In [56]:
# Function to map test data to the chosen ideal functions
def map_test_data(test_data, training_data, ideal_functions, best_functions):
    mapped_data = {'x': [], 'y': [], 'Delta y': [], 'No. of ideal func': []}
    
    for index, row in test_data.iterrows():
        x, y = row['x'], row['y']
        for train_col, ideal_col in best_functions.items():
            max_deviation = max(abs(training_data[train_col] - ideal_functions[ideal_col]))
            ideal_y = ideal_functions.loc[ideal_functions['x'] == x, ideal_col].values[0]
            deviation = abs(y - ideal_y)
            if deviation <= max_deviation * math.sqrt(2):
                mapped_data['x'].append(x)
                mapped_data['y'].append(y)
                mapped_data['Delta y'].append(deviation)
                mapped_data['No. of ideal func'].append(ideal_col)
                break
    
    return pd.DataFrame(mapped_data)

In [54]:
print("mapped data for test data chosen from four ideal functions :\n",mapped_data )

mapped data for test data chosen from four ideal functions :
                X          Y   Delta Y No. of ideal func
0   1.750000e+01  34.161040  0.351148               y41
1   3.000000e-01   1.215102  0.467342               y41
2   8.000000e-01   1.426456  0.532222               y41
3   1.400000e+01  -0.066506  0.134233               y48
4  -1.500000e+01  -0.205363  0.452371               y48
5   5.800000e+00  10.711373  0.656326               y41
6  -1.980000e+01 -19.915014  0.115014               y11
7   1.890000e+01  19.193245  0.293245               y11
8   8.800000e+00  -0.726051  0.488840               y48
9  -9.500000e+00  -9.652251  0.152251               y11
10  8.100000e+00 -16.659458  0.337686               y42
11 -8.800000e+00  16.571745  0.622709               y42
12 -3.100000e+00  -2.770136  0.329864               y11
13 -1.180000e+01  24.606413  0.646196               y42
14  1.880000e+01  37.523400  0.051833               y41
15  7.700000e+00  15.392297  0.501787     

In [41]:
 # step7: in this step create a function for visuualizing all of dataset given and created by best fucnction and mapped data

In [50]:

# Function to visualize the data
def visualize_data(training_data, test_data, ideal_functions, best_functions, mapped_data):
    output_file("visualization.html")
    p = bk.figure(title="Data Visualization", x_axis_label='x', y_axis_label='y')

    # Plot training data
    for col in training_data.columns[1:]:
        p.line(training_data['x'], training_data[col], legend_label=f'Training {col}', line_width=2)
    
    # Plot ideal functions
    for col in best_functions.values():
        p.line(ideal_functions['x'], ideal_functions[col], legend_label=f'Ideal {col}', line_dash="dashed", line_width=2)
    
    # Plot test data
    p.circle(test_data['x'], test_data['y'], legend_label='Test Data', color='red', size=8)
    
    # Plot mapped data
    p.triangle(mapped_data['x'], mapped_data['y'], legend_label='Mapped Data', color='green', size=8)

    p.legend.location = "top_left"
    show(p)

In [46]:
print(visualize_data(training_data, test_data, ideal_functions, best_functions, mapped_data))

None


In [None]:
 # step 8: in this step create a class for unit test 

In [51]:

# Unit tests
class TestFunctions(unittest.TestCase):
    def setUp(self):
        self.training_data = pd.DataFrame({
            'x': [1, 2, 3],
            'y1': [1.1, 2.1, 3.1],
            'y2': [1.2, 2.2, 3.2],
            'y3': [1.3, 2.3, 3.3],
            'y4': [1.4, 2.4, 3.4]
        })
        self.ideal_functions = pd.DataFrame({
            'X': [1, 2, 3],
            'y1': [1, 2, 3],
            'y2': [1, 2, 3],
            'y3': [1, 2, 3],
            'y4': [1, 2, 3],
            'y5': [1.1, 2.1, 3.1]
        })
        self.test_data = pd.DataFrame({'x': [1, 2], 'y': [1.1, 2.1]})
        self.best_functions = find_best_ideal_functions(self.training_data, self.ideal_functions)

    def test_find_best_ideal_functions(self):
        self.assertEqual(self.best_functions, {'y1': 'y5', 'y2': 'y5', 'y3': 'y5', 'y4': 'y5'})
    
    def test_map_test_data(self):
        mapped_data = map_test_data(self.test_data, self.training_data, self.ideal_functions, self.best_functions)
        expected_data = pd.DataFrame({
            'x': [1, 2],
            'y': [1.1, 2.1],
            'Delta y': [0.0, 0.0],
            'No. of ideal func': ['Y5', 'Y5']
        })
        pd.testing.assert_frame_equal(mapped_data, expected_data)


In [None]:
 # step 9: in this step create a main function which call all of the functions step by step

In [52]:
# Main function to run the steps
def main():
    # Load datasets
    training_data = load_csv('Dataset2/train.csv')
    ideal_functions = load_csv('Dataset2/ideal.csv')
    test_data = load_csv('Dataset2/test.csv')

    # Create database and populate tables
    conn = create_database(training_data, ideal_functions, test_data)
    
    # Find best ideal functions
    best_functions = find_best_ideal_functions(training_data, ideal_functions)

    
    # Map test data to chosen ideal functions
    mapped_data = map_test_data(test_data, training_data, ideal_functions, best_functions)
    
    # Visualize the data
    visualize_data(training_data, test_data, ideal_functions, best_functions, mapped_data)

    # Close the database connection
    conn.close()
 


In [53]:
if __name__ == "__main__":
    main()
    unittest.main(argv=[''], verbosity=2, exit=False)

test_find_best_ideal_functions (__main__.TestFunctions.test_find_best_ideal_functions) ... ok
test_map_test_data (__main__.TestFunctions.test_map_test_data) ... ERROR

ERROR: test_map_test_data (__main__.TestFunctions.test_map_test_data)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas\_libs\index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'x'

The above exception was the direct cause of the foll

In [48]:
# Clone the repository and checkout the develop branch
!git clone https://github.com/MostafaMoeiniML/Python-assignment
%cd Python-assignment


# Create a new branch for your changes
!git checkout -b my_new_feature

# Add your changes
!git add .

# Commit your changes
!git commit -m "Add new feature to map test data to ideal functions"

# Push your changes to the remote repository
!git push origin my_new_feature

# Create a pull request on GitHub to merge your changes into the develop branch


D:\Python project\Written Assignment\Python-assignment\Python-assignment


Cloning into 'Python-assignment'...
Switched to a new branch 'my_new_feature'


On branch my_new_feature
nothing to commit, working tree clean


remote: 
remote: Create a pull request for 'my_new_feature' on GitHub by visiting:        
remote:      https://github.com/MostafaMoeiniML/Python-assignment/pull/new/my_new_feature        
remote: 
To https://github.com/MostafaMoeiniML/Python-assignment
 * [new branch]      my_new_feature -> my_new_feature
