<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Manufacturing Defect Analysis
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Consider a major auto manufacturer as our client. The client has reported a serious business issue with increasing warranty repairs. Battery pack replacements primarily drive the issue for EV(Electric Vehicle). As an EV manufacturer, batteries are one of the most expensive and critical components that go into the product. The client wants you to find the root cause and provide actionable insights.</p>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Data:</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The data for this demonstration resides on Vantage as well as AWS S3. In a modern analytic ecosystem, vast amounts of data are collected from manufacturing lines and even sensors on the finished product. That data can be left on inexpensive cloud storage and accessed when investigating a problem. We have low-volume, high-value data referenced frequently inside Vantage that has gone through traditional ETL processes to ensure quality, ease of analysis, and performance.  
</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this demonstration, we will use the structured data inside Vantage to narrow down a problem, then go out to the cloud storage to define a view that shreds a subset of that data so we can join both data sets in a single query to solve our business issue.  A description of the tables involved in this demo is at the end of this notebook.
</p>



<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Steps in the analysis:</b></p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Initiate a connection to Vantage</li>    
    <li>Narrow down to root cause</li>
        <ul>
            <li>2.1 Were the defects based on the dealers?</li>
            <li>2.2 Were the defects based on the model of the cars?</li>
            <li>2.3 Were the defects based on the assembly plants?</li>
            <li>2.4 Were the defects based on the battery cells?</li>
            <li>2.5 Were the defects based on the lot numbers?</li>
        </ul>
    <li>Analysis of Test reports from Data Lake</li>
        <ul>
            <li>3.1 Create a foreign table to access the JSON data from Amazon S3</li>
            <li>3.2 Access and join the JSON manufacturing test data natively in Vantage</li>
        </ul>
    <li>Clean-up</li>
</ol>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Configuring the Environment</b>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Installing some dependencies</b>

In [None]:
%%capture
!pip install --upgrade teradataml

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>The above statements may need to be uncommented if you run the notebooks on a platform other than ClearScape Analytics Experience that does not have the libraries installed. If you uncomment those installs, be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
</div>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import pandas as pd
import getpass
from teradataml import *
import plotly.express as px

pd.set_option('display.max_colwidth', None)

import warnings
warnings.filterwarnings('ignore')

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=ManufacturingDefects.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_EVCarBattery_cloud');"        # Takes 15 seconds
%run -i ../run_procedure.py "call get_data('DEMO_EVCarBattery_local');"        # Takes 4 minutes

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Optional step – We should execute the below step only if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Narrow down the root cause</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this section, we will investigate the source of defective batteries. We’ll address questions such as whether we have installed bad batteries in a specific model and whether cars with faulty batteries are sold by a particular dealer.</p>

In [None]:
dealers = DataFrame(in_schema('DEMO_EVCarBattery', 'dealers'))
bad_batteries = DataFrame(in_schema('DEMO_EVCarBattery', 'bad_batteries'))
vehicles = DataFrame(in_schema('DEMO_EVCarBattery', 'vehicles'))
mfg_plants = DataFrame(in_schema('DEMO_EVCarBattery', 'mfg_plants'))
bom = DataFrame(in_schema('DEMO_EVCarBattery', 'bom'))
parts = DataFrame(in_schema('DEMO_EVCarBattery', 'parts'))

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>3.1 Were the defects based on the dealers?</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Is any specific dealer the cause for the increased warranties?</p>

In [None]:
res = vehicles.join(bad_batteries, how='inner', on='vin', lprefix='v', rprefix='bb')\
    .join(dealers, how='inner', on=['dealer_id=id'], lprefix='t', rprefix='d')\
    .select(['Company', 'customer_id'])\
    .groupby('Company')\
    .count()

res = res.to_pandas()

fig = px.pie(
    data_frame=res,
    values='count_customer_id',
    names='Company',
    title='Proportion of bad battery warranty claims by dealers'
)

fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Warranty claims for cars with defective batteries are from all dealers equally. So dealers are not the culprits here. The issue is in the earlier stages of the manufacturing pipeline.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>3.2 Were the defects based on the model of the cars?</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Is any specific model of car been installed with defective batteries? We use the same set of battery parts in several different models across our product line.</p>

In [None]:
res = vehicles.join(bad_batteries, how='inner', on='vin', lprefix='v', rprefix='bb')\
    .select(['model', 'customer_id'])\
    .groupby('model')\
    .count()

res = res.to_pandas()

fig = px.pie(
    data_frame=res,
    values='count_customer_id',
    names='model',
    title='Proportion of bad battery warranty claims by dealers vehicle model'
)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Warranty claims for cars with defective batteries are from all models almost equally. So car models are not the culprits here. The issue is in the earlier stages of the manufacturing pipeline.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>3.3 Were the defects based on the assembly plants?</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Is any specific manufacturing plant installing the defective batteries in the cars?</p>

In [None]:
res = vehicles.join(bad_batteries, how='inner', on='vin', lprefix='v', rprefix='bb')\
    .join(mfg_plants, how='inner', on=['mfg_plant_id=id'], lprefix='t', rprefix='mfg')\
    .select(['Company', 'customer_id'])\
    .groupby('Company')\
    .count()

res = res.to_pandas()

fig = px.pie(
    data_frame=res,
    values='count_customer_id',
    names='Company',
    title='Proportion of bad battery warranty claims by manufacturing plant'
)

fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>A whopping 81.4% of warranty claims for defective batteries are from a single manufacturing plant, i.e. Jackson Plant. We have found the culprit here!</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>3.4 Were the defects based on the battery cell?</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Although we have found the manufacturing plant that installed the most defective batteries, it still makes sense to dig deeper and go into finer details. Let's find out what battery cells(type of battery) are installed in cars with bad batteries:</p>

In [None]:
res = bom.join(bad_batteries, how='inner', on='vin', lprefix='bom', rprefix='bb')\
    .join(parts, how='inner', on='part_no', lprefix='t', rprefix='p')\
    .select(['id', 'p_part_no', 'description'])\
    .groupby(['p_part_no', 'description'])\
    .count()

res = res[res.description.like('Battery Cell%')].to_pandas()

fig = px.pie(
    data_frame=res,
    values='count_id',
    names='p_part_no',
    title='Proportion of bad battery warranty claims by battery cell'
)

fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Ok, we have an issue with part_no '20rd0'! This part(type of EV battery) has led to a majority of warranty claims for bad batteries.</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>3.5 Were the defects based on the lot numbers?</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the previous step, we found the exact part that was the cause of the bad batteries. But are all the batteries with part_no '20rd0' faulty or is there any correlation with the lot number? In simple terms, were there any specific lot(s) which produced faulty batteries? We store detailed manufacturing data in our integrated data warehouse. Let us use that to perform the analysis.</p>

In [None]:
res = bom.join(bad_batteries, how='inner', on='vin', lprefix='bom', rprefix='bb')\
    .join(parts, how='inner', on='part_no', lprefix='t', rprefix='p')\
    .select(['id', 'p_part_no', 'description', 'lot_no'])\
    .groupby(['p_part_no', 'lot_no', 'description'])\
    .count()

res[res.description.like('Battery Cell%')].sort('count_id', ascending = False)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Ok, now we know the underlying issue with part_no '20rd0' - the majority of the failures are from battery lot '4102' (which turns out to have been delivered to the Jackson Plant) has a considerable number of faulty batteries that are driving our warranty replacements.
<br>
<br>
Great! So we found out the root cause for the increased warranty claims for faulty batteries. Could we do more to give some actionable insights?</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>4. Analysis of Test reports from Data Lake</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Taking this analysis even further, we want to understand how we can detect bad batteries before they end up in our customers' cars. This will help us avoid expensive warranty repair cycles and poor customer satisfaction in the future. When the cars are manufactured, we store detailed test reports for the various parts and subsystems that comprise the vehicle. These are voluminous semi-structured data and are loaded directly into our Data Lake, which is housed in an object store(AWS S3).
<br>
<br>
Using <b>Teradata Vantage</b> we can <b>natively</b> pull in this data and use it for our analysis!</p>
<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>4.1 Create a foreign table to access the JSON data from Amazon S3</b></p>

In [None]:
%%capture
query = '''
CREATE FOREIGN TABLE test_reports
(
    Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
    payload JSON(16776192) INLINE LENGTH 64000 CHARACTER SET LATIN)
USING (
    Location ('/s3/s3.amazonaws.com/trial-datasets/EVCarBattery')
), NO PRIMARY INDEX;
'''

execute_sql(query)

In [None]:
DataFrame('test_reports').sample(n=2)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The above data dump is not readable and hence let us put a user-friendly view on top of the foreign table to shred the files and make the test report data easier to access:</p>

In [None]:
%%capture
query = '''
REPLACE VIEW test_reports_v AS
(
    SELECT tr.part_no, tr.lot_no, test_report,
    tr.test_report."Rated Capacity" AS rated_capacity,
    tr.test_report."Static Capacity Test"."Measured Average Capacity" AS measured_capacity
    FROM (SELECT
        vin,
        part_no,
        lot_no,
        CAST(test_report AS JSON) test_report
    FROM TD_JSONSHRED(
        ON (SELECT payload.vin as vin, payload FROM test_reports)
                USING
                ROWEXPR('parts')
                COLEXPR('part_no', 'lot_no', 'test_report') 
                RETURNTYPES('VARCHAR(17)', 'VARCHAR(1000)', 'VARCHAR(10000)')
    ) AS d1 (vin, part_no, lot_no, test_report)) tr
)
'''

execute_sql(query)

In [None]:
test_reports = DataFrame('test_reports_v')
test_reports

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>So now we have test reports for parts of each vehicle. Batteries have detailed test reports rather than just pass/fail. Can we use this data to provide any valuable insights?</p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>4.2 Access and join the JSON manufacturing test data natively in Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Various parts will have different data that gets reported when testing. The test report for the battery has detailed data on its performance after assembling but before fitting it into the vehicle.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We want to compare the rated and measured capacities along with part/lot numbers for just the batteries - we can easily drill into the JSON data using simple dot notation to access the test results we need.</p>

In [None]:
res = test_reports.join(parts, how='inner', on='part_no', lprefix='tr', rprefix='p')\
    .select(['p_part_no', 'description', 'lot_no', 'rated_capacity', 'measured_capacity'])

res = res[res.description.like('Battery Cell%')].to_pandas(all_rows = True)
res['measured_capacity'] = round(res['measured_capacity'].astype('float'), 2)

In [None]:
fig = px.violin(res, y = "measured_capacity", color = "lot_no")
fig.add_hline(y = 6.9)
fig.add_hline(y = 6.0)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Above graph illustrates the distribution of the measured battery capacity, grouped by lot numbers. Each line (as it appears) represents a violin plot. Feel free to zoom in for a clearer view. Violin plots display the probability density of the data, specifically the measured capacity data per lot. Hover over each violin plot to view details such as minimum, maximum, median, and quantiles.
<br>
<br>
One observation is the pink plot which goes below the lower threshold. This plot is for <b>lot_no 4102</b>. Looking closely, we see that the violin plot for lot_no 4102 is inclined towards the bottom. The next graph shows the violin plot for lot_no 4102 explicitly.</p>

In [None]:
warnings.filterwarnings('ignore')

fig = px.violin(res[res.lot_no == '4102'], y="measured_capacity", color = "lot_no", points='all')
fig.add_hline(y = 6.9)
fig.add_hline(y = 6.5)
fig.add_hline(y = 6.0)
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>These battery packs(lot_no = 4102) are within the specification, but the range is much lower than the other battery lots. A significant amount of batteries are concentrated in the lower part of the graph, i.e. with lower measured capacity than what is rated for this lot(6.5).</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Insights:</b></p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>Test reports need to be proactively monitored in order to avoid installing faulty batteries</li>
    <li>Tighten up the acceptance criteria of batteries</li>
</ol>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>These initiatives will increase product quality and make sure this doesn't happen again!</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>5. Cleanup</b>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Work Tables/Views</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We need to clean up our work tables/views to prevent errors next time.</p>

In [None]:
db_drop_view(view_name='test_reports_v')

In [None]:
db_drop_table(table_name='test_reports')

<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will use the following code to clean up tables and databases created for this demonstration.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_EVCarBattery');"        # Takes 5 seconds

In [None]:
remove_context()

<hr style="height:2px;border:none;background-color:#00233C;">

<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Required Materials</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let’s look at the elements we have available for reference for this notebook:</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Filters:</b></p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Industry:</b> Manufacturing</li>
    <li><b>Functionality:</b> Data Analysis</li>
    <li><b>Use Case:</b> Defect Analysis</li>
</ul>

<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Dataset:</b>

<b><i>bom</i></b> - Bill of materials - contains all major parts that make up each vehicle:
<ul>
    <li><code>id</code>: unique identifier</li>
    <li><code>vin</code>: vehicle identification number</li>
    <li><code>part_no</code>: part number</li>
    <li><code>vendor_id</code>: vendor the part was produced by (unused)</li>
    <li><code>lot_no</code>: lot number from the vendor</li>
    <li><code>quantity</code>: how many of this part are in the vehicle</li>
</ul>

<b><i>dealers</i></b> - Vehicle sales and distributors:
<ul>
    <li><code>id</code>: unique identifier</li>
    <li><code>Company</code>: company name</li>
    <li><code>StreetAddress</code>: street address</li>
    <li><code>City</code>: city</li>
    <li><code>State</code>: state</li>
    <li><code>ZipCode</code>: postcode</li>
    <li><code>Country</code>: country</li>
    <li><code>EmailAddress</code>: main email address</li>
    <li><code>TelephoneNumber</code>: telephone number</li>
    <li><code>DomainName</code>: URL for company website</li>
    <li><code>Latitude</code>: latitude (location)</li>
    <li><code>Longitude</code>: longitude (location)</li>
</ul>

<b><i>mfg_plants</i></b> - Manufacturing facilities:
<ul>
    <li><code>id</code>: unique identifier</li>
    <li><code>Company</code>: facility name</li>
    <li><code>StreetAddress</code>: street address</li>
    <li><code>City</code>: city</li>
    <li><code>State</code>: state</li>
    <li><code>ZipCode</code>: postcode</li>
    <li><code>Country</code>: country</li>
    <li><code>EmailAddress</code>: main email address</li>
    <li><code>TelephoneNumber</code>: telephone number</li>
    <li><code>DomainName</code>: URL for plant website</li>
    <li><code>Latitude</code>: latitude (location)</li>
    <li><code>Longitude</code>: longitude (location)</li>
</ul>

<b><i>parts</i></b> - Master list of parts for all vehicles:
<ul>
    <li><code>part_no</code>: unique part number</li>
    <li><code>description</code>: part description</li>
</ul>

<b><i>vehicles</i></b> - Vehicles we have built/are building:
<ul>
    <li><code>vin</code>: unique identifier</li>
    <li><code>yr</code>: model year</li>
    <li><code>model</code>: vehicle model code</li>
    <li><code>customer_id</code>: customer / purchaser</li>
    <li><code>dealer_id</code>: dealer where vehicle was sold/delivered</li>
    <li><code>mfg_plant_id</code>: plant the vehicle was assembled</li>
</ul>

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2023, 2024. All Rights Reserved
        </div>
    </div>
</footer>