Skip to content

K-Kazemian/python-csv-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Python Medicine Data Extractor

This is a simple Python script to read a complex CSV file of medicine data and extract specific columns into a new, clean CSV file.

This project was written as part of an educational conversation to solve the problem of reading CSV files with an unknown file encoding.

What This Project Does

The medicines.csv file (sourced from the EMA website) has 8 extra header rows and many columns. This script:

Skips the first 8 rows of the file.

Reads only the 'Name of medicine', 'Active substance', and 'Therapeutic area (MeSH)' columns.

Creates a new clean output file named extracted_medicines.csv using the standard 'utf-8' encoding.

Files

process_medicines.py: The main Python script.

medicines.csv: The raw input data file (included in this repository for testing).

How to Use It

Make sure you have Python 3 installed on your system.

Download (Download ZIP) or clone this repository.

Open your terminal (or Command Prompt) in the project folder.

Run the following command:

python process_medicines.py

(Depending on your system, you may need to use python3 instead of python)

If the script is successful, the output file extracted_medicines.csv will be created in the same folder.

The Encoding Challenge

The main challenge of this project was finding the correct file encoding for medicines.csv. The following encodings were tested:

utf-8: Failed

windows-1252: Failed

latin-1: Failed

cp1250: (Currently testing)

The script in this repository uses cp1250 to attempt to solve this.

About

A Python script to extract specific columns from a complex medicine CSV file.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages