In [2]:
import pandas as pd
import re
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Summary of paper: Fast retrievals of test-pad coordinates from photo images of printed circuit boards, By Swee Chuan Tan and Schumann Tong Wei Kit, 2016 International Conference on Advanced Mechatronic Systems (ICAMechS), 2016 [1]

### 1. The paper I will be summarizing can be found here: https://ieeexplore.ieee.org/document/7813492/authors#authors
### The data used in this paper: Printed Circuit Board Processed Image by Swee Chuan Tan [10], can be found here: https://archive.ics.uci.edu/dataset/990/printed+circuit+board+processed+image 

## Explanation of terms
1. Printed Circuit Board (PCB)- A printed circuit board is an essential unit of an electronic device which serves as the base for connecting and powering components to create a single, fully functional electronic circuit that can power and control the device. [2] It's a mechanical base used to hold and connect the components of an electric circuit. PCBs are used in nearly all modern consumer electronic devices and accessories, including phones, tablets, smartwatches, wireless chargers, and power supplies [3]

2. Test-pad of PCB - A test point in a PCB is an exposed copper pad that can be used to check whether a circuit is functioning to specification. During production, users can inject test signals via probes through the test points to detect potential issues. [4] A signal or power net is exposed, and can be connected to by an external probe. Bare PCBs are usually tested for short- and open-circuits using the component pads themselves, and this is usually done with a “flying probe” tester. [5]

3. Reverse engineering - Reverse-engineering is the act of dismantling an object to see how it works. It is done primarily to analyze and gain knowledge about the way something works but often is used to duplicate or enhance the object.[6] 

4. Flying probe testers - A flying probe test is a testing method for electronic circuits primarily used to test PCBs. It employs a system of movable probes that virtually "fly" over the circuit board, making electrical contact with specific test points on the PCB. 
Ultimately, it identifies defects and verifies the electrical performance of the circuit. The flying probe test system consists of several key components, including the probes themselves, a test fixture to hold the PCB in place, and control software to manage the testing process[7]

5. Short circuit - A bad electrical connection that causes the current to flow in the wrong direction, often having the effect of stopping the power supply. [8]

6. Open circuit - A circuit in which the continuity is broken due to which the electric current can not flow [9]

7. Two-Step cluster - The TwoStep Cluster node provides a form of cluster analysis. It can be used to cluster the dataset into distinct groups when you don't know what those groups are at the beginning. As with Kohonen nodes and K-Means nodes, TwoStep Cluster models do not use a target field. Instead of trying to predict an outcome, TwoStep Cluster tries to uncover patterns in the set of input fields. Records are grouped so that records within a group or cluster tend to be similar to each other, but records in different groups are dissimilar. [11] 

## Summary 
### The aim of this paper
In the paper “Fast retrievals of test-pad coordinates from photo images of printed circuit boards" Tan and Kit (2016) [1] present a new data analytic technique on how to recover test-pad information from photo images of printed circuit boards (PCBs) This technique is particularly useful for reverse engineering  printed circuit boards(PCB's) in situations where documentation is incomplete or unavailable. The retrieved test-pad coordinates serve as crucial input for robotic flying probe testers, which are mechatronic systems capable of performing a wide range of diagnostic tests on PCBs without requiring circuit board schematics.
### Introductory problem statement
The authors highlight the growing importance of data analytics in solving complex and tedious problems, such as reverse engineering of electronic components. Legacy systems, often deployed in industrial or military applications, typically lack adequate documentation, which complicates maintenance and repair tasks. Issues such as missing schematics, obsolete components, and insufficient vendor support are common. Among these challenges, the absence of circuit-board-level information can significantly hinder repair processes.

To address this issue, the paper proposes a method to automate the identification of test pads on PCB images. Test pads are essential for guiding robotic flying probe testers in conducting diagnostic tasks such as connectivity tests, component testing, and voltage or impedance measurements. While manual identification of test pads is feasible, it is time-consuming and prone to human error. The proposed approach leverages clustering techniques to automate this process, thereby enhancing efficiency and accuracy.

### Methods of reverse engineering and the appoach the autors take
The paper highlights two main approaches to PCB reverse engineering: the destructive approach, which involves physically deconstructing the PCB to recover detailed design information but causes permanent damage, and the non-destructive approach, which preserves the PCB and uses techniques like image analysis to extract specific details, such as netlists. The choice between these methods depends on the amount of information needed and the feasibility of preserving the board. This paper focuses on a non-destructive method for identifying test-pad locations using clustering analysis, emphasizing the need for high precision and recall to minimize errors while maintaining the PCB's integrity.

### Data and methods used in the paper
**1.The analysis begins** with a digital image of a printed circuit board (PCB), provided with permission from the image owner. Using a Java program, the location and color attributes of each pixel in the image are extracted into a numerical dataset. Each pixel is described by five variables: X and Y, which indicate the pixel’s horizontal and vertical positions, and R, G, and B, which represent the red, green, and blue color intensities respectively, each ranging from 0 to 255. The dataset comprises 71,040 observations, corresponding to the 71,040 pixels in the image. The primary goal of this project is to identify the locations of test pads, which appear as gray circular dots on the PCB.

To achieve this, a two-stage clustering approach is employed. The first stage groups the pixels by their color attributes to identify gray pixels corresponding to test pads. The second stage clusters these gray pixels based on their spatial locations to define individual test pads while filtering out noise. This method ensures accurate identification of test pads with minimal errors.

**2.In the first stage of the method**, the focus is on clustering the pixels from the PCB image based on their color attributes (R, G, and B values). The PCB image contains several predominant colors, including various shades of green, white, black, and gray. The objective in this stage is to isolate clusters of gray pixels, as these correspond to the test pads on the PCB. Gray pixels are identified based on the common characteristic where the red (R), green (G), and blue (B) color intensities are approximately equal (R ≈ G ≈ B).

To achieve this, the K-Means clustering algorithm is applied to the RGB data. The estimated number of clusters ranges from 10 to 12, reflecting the variety of colors present in the PCB image. Three clustering solutions (10, 11, and 12 clusters) are generated to determine the most suitable one for isolating gray pixels. The 12-cluster solution is found to be optimal, as it includes two clusters with average RGB values closest to gray: Cluster 1: R=131, G=147, B=153. and Cluster 2: R=93, G=113, B=115.

These two clusters are identified as representing gray pixels corresponding to test pads. To validate this identification, the spatial distribution of the gray pixel records is plotted using their X and Y coordinates. The resulting scatter plot demonstrates that the layout of these gray pixels closely resembles the actual distribution of test pads on the PCB, confirming the effectiveness of the clustering process in this stage.

**3.In the second stage**, gray pixels identified earlier are clustered based on their spatial coordinates (X and Y) to locate individual test pads. While the PCB contains 120 legitimate test pads, approximately 50 sporadic gray pixels act as noise. To address this, 170 clusters are created to separate noise, and clusters with fewer than 10 pixels are excluded as they likely represent noise. The centroids of the remaining clusters are identified as test-pad locations.

Two-Step clustering outperforms K-Means in this task due to its pre-clustering step, which isolates sporadic pixels into smaller clusters, preventing them from merging with legitimate test-pad clusters. K-Means, in contrast, is more prone to errors as it can misclassify noise as test pads. The Two-Step method thus ensures more accurate and reliable test-pad identification by effectively handling noise.

This two-stage clustering process successfully identifies the test-pad locations on the PCB. By combining color-based clustering to isolate gray pixels with spatial clustering to group them into test pads, the method ensures high precision and recall. The use of Two-Step clustering further enhances accuracy by effectively handling noise, making this approach a reliable and efficient solution for PCB reverse engineering

### Results
The results of the Two-Step clustering method demonstrate high accuracy, as nearly all 120 legitimate test pads on the PCB were correctly identified. An overlay of the clustering output on the PCB image confirms the alignment of cluster centroids with test pad locations. However, upon closer inspection, some errors are noted: five points were incorrectly identified as test pads, and three duplicate test pads were generated for larger test pads. These eight irrelevant points reduce the precision to 93.75% (120 out of 128 identified points), though the recall remains at 100%.

Importantly, the errors are minimal and can be easily corrected through visual inspection or post-processing, showcasing the robustness and reliability of the two-stage clustering approach for recovering test-pad locations.

## Reproducing the experiment 

1. For that we will be using the Printed Circuit Board dataset from UC Irvine[10]
- This data is the same already processed data with 71 040 observations and features (X,Y,R,G,B)

2. Let's first read the data

In [12]:
data_pcb = pd.read_csv('TestPad_PCB_XYRGB_V2.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'TestPad_PCB_XYRGB_V2.csv'

## References
[1] Tan, S. C., & Kit, S. T. W. (2016). Fast retrievals of test-pad coordinates from photo images of printed circuit boards. 2016 International Conference on Advanced Mechatronic Systems (ICAMechS), 464–467. https://doi.org/10.1109/icamechs.2016.7813492

[2] ALFA BRAVO, studio@alfabravo.pl. (n.d.-b). Circuit board components: Learn the types and their usage on PCB. VECTOR BLUE HUB. https://vectorbluehub.com/circuit-board-components#:~:text=A%20printed%20circuit%20board%20is,%2D%20and%20double%2Dsided%20designs.

[3] What is a printed circuit board (PCB)? (n.d.). Ansys. Retrieved November 19, 2024, from https://www.ansys.com/simulation-topics/what-is-a-printed-circuit-board

[4] What are PCB Test points? - PCB Directory. (n.d.). https://www.pcbdirectory.com/community/what-are-pcb-test-points

[5] What are PCB Testpoints? (n.d.). https://www.labcenter.com/blog/pcb-testpoints/#:~:text=Testpoints%20are%20pads%20on%20the,a%20%E2%80%9Cflying%20probe%E2%80%9D%20tester.

[6] Lutkevich, B. (2021, June 10). reverse-engineering. Search Software Quality. https://www.techtarget.com/searchsoftwarequality/definition/reverse-engineering#:~:text=Reverse%2Dengineering%20is%20the%20act,duplicate%20or%20enhance%20the%20object.

[7] Sufyan, M. (2023, September 28). Flying Probe Test: an extensive guide to the technology and applications. Wevolver. https://www.wevolver.com/article/flying-probe-test-an-extensive-guide-to-the-technology-and-applications

[8] short circuit. (2024). https://dictionary.cambridge.org/dictionary/english/short-circuit

[9] Byju’s. (2022, July 4). What does the term open circuit mean-. https://byjus.com/question-answer/the-term-open-circuit-means/#:~:text=A%20circuit%20in%20which%20the,dissipates%20from%20an%20open%20circuit.

[10] UCI Machine Learning Repository. (n.d.). https://archive.ics.uci.edu/dataset/990/printed+circuit+board+processed+image

[11] IBM Cloud PAK for Data 5.0.X. (n.d.). https://www.ibm.com/docs/en/cloud-paks/cp-data/5.0.x?topic=modeling-twostep-cluster-node