##  Unit Testing for Telecommunication Billing Data Pipeline WK 11 ~ JKoruda

# Background Information
You are working on a project related to telecommunication billing data. As part of the project, a
data pipeline has been provided to you. The data pipeline is responsible for extracting data from
a CSV file, performing transformations using pandas, and storing the transformed data in
another CSV file. Your task is to write unit tests for the functions in the data pipeline using the
unittest framework

#Guidelines
* Use the unittest framework to create test cases for each function in the data
pipeline.
* Write at least three test cases for each function, covering different scenarios and edge
cases.
* Ensure that your tests are independent and do not rely on each other.
* Name your test methods descriptively to indicate the scenario being tested.
* Use assertions to validate the expected behavior of each function.
* Provide informative error messages when assertions fail to aid in debugging.


# Dataset composition
. It contains telecommunication billing data with the following columns:
* customer_id
* billing_amount
* tax_amount

You can use this file as the input for your unit tests.
* file > billing_data.csv

Import panda and test libraries

In [3]:
import pandas as pd
import unittest


Read data from the file

In [4]:
#file path as defined by file location in colab
file_path = '/content/billing_data.csv'


Extract data loaded from the csv file

In [13]:
def data_extraction(file_path):
    data = pd.read_csv(file_path)
    return data


Tranform data columns

In [15]:
def data_transformation(data):
    data = data.drop_duplicates()
    data['billing_amount'] = data['billing_amount'].str.replace('$', '').astype(float)
    data['total_charges'] = data['billing_amount'] + data['tax_amount']
    return data


Load transformed data into output file

In [16]:
#path to output file
output_file = '/content/billing_data_output.csv'
def data_loading(data, output_file):
    data.to_csv(output_file, index=False)

Perform test cases on  data extraction, tranformation and Loading procedures

In [None]:



class TestDataPipeline(unittest.TestCase):

#upload data test files
    def setUp(self):
     self.file_path = 'billing_data.csv'
     self.output_file = 'billing_data_output.csv'
     self.data = pd.read_csv(self.file_path)

#remove files after test
    def tearDown(self):
      pass

    def test_data_extraction(self):
        data_extraction(self.file_path)

        # Test Data Extraction Function
        extracted_data = data_extraction(self.file_path)

        #Test if exracted data is a dataframe
        self.assertTrue(isinstance(extracted_data, pd.DataFrame))
        self.assertEqual(extracted_data.shape, self.data.shape)

        # Test the columns of the extracted data
        existing_columns = ['customer_id', 'billing_amount', 'tax_amount']
        self.assertListEqual(list(extracted_data.columns), existing_columns)

        # Test if extraxted data is not empty
        self.assertFalse(extracted_data.empty)

    def test_data_transformation(self):

        extracted_data = data_extraction(self.file_path)

        transformed_data = data_transformation(extracted_data)

        # test if data transformed is a dataframe
        self.assertIsInstance(transformed_data, pd.DataFrame)
        self.assertEqual(len(transformed_data), len(self.expected_data))
        #check for  duplicates in the data
        self.assertEqual(len(transformed_data), len(self.data.drop_duplicates()))

        # Test conversion of billing  and tax amount
        self.assertTrue(pd.api.types.is_numeric_dtype(transformed_data['tax_amount']))
        self.assertTrue(pd.api.types.is_numeric_dtype(transformed_data['billing_amount']))


    def test_data_loading(self):

       # Test data loadiing in csv
        data_loading(self.data, self.output_file)

        loaded_data = pd.read_csv(self.output_file)
        #validate if loaded_data is dataframe
        self.assertTrue(isinstance(loaded_data, pd.DataFrame))
        self.assertEqual(loaded_data.shape, self.data.shape)
        self.assertTrue(loaded_data.equals(self.data))

        # Validate columns of the output files
        expected_columns = ['customer_id', 'billing_amount', 'tax_amount', 'total_charges']
        self.assertListEqual(list(loaded_data.columns), expected_columns)

        # validate  loaded file against original data
        self.assertTrue(loaded_data.equals(self.data))

if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)