Skip to content

Deeksha-Gokarn/Descriptive-Statistical-Analysis-using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The raw unstrctured big data is considered for this repo. The dataset is a webpage produced from automotive testing. The resulted webpage constitutes series of HTML and XML contents and are humongous. The XML and HTML contents are initially parsed in the data pre- processing stage as shown in Testfall.py, Teschritte.py, Inhalt.py. The parsed data is then stored in the database. The data extracted from the database is in a transformed structured format as shown in Database file. This data is then loaded into a data frame. The above process is considered ETL. The resulted structured data is efficient and is also made more human-readable. Statistical analysis specifically descriptive analysis is then performed on the structured data shown in . Additionally, the error patterns, total error counts, and repeated testing statistics are made with the help of data visualization using matplotlib histograms as shown in matplotlib file.

THE OUTPUT OF ALL THE FILES IN THIS REPO ARE NOT UPLOADED SINCE IT CONTAINS PRIVATE/PERSONAL DATA