GitHub - K-Kazemian/python-csv-extractor: A Python script to extract specific columns from a complex medicine CSV file.

Python Medicine Data Extractor

This is a simple Python script to read a complex CSV file of medicine data and extract specific columns into a new, clean CSV file.

This project was written as part of an educational conversation to solve the problem of reading CSV files with an unknown file encoding.

What This Project Does

The medicines.csv file (sourced from the EMA website) has 8 extra header rows and many columns. This script:

Skips the first 8 rows of the file.

Reads only the 'Name of medicine', 'Active substance', and 'Therapeutic area (MeSH)' columns.

Creates a new clean output file named extracted_medicines.csv using the standard 'utf-8' encoding.

Files

process_medicines.py: The main Python script.

medicines.csv: The raw input data file (included in this repository for testing).

How to Use It

Make sure you have Python 3 installed on your system.

Download (Download ZIP) or clone this repository.

Open your terminal (or Command Prompt) in the project folder.

Run the following command:

python process_medicines.py

(Depending on your system, you may need to use python3 instead of python)

If the script is successful, the output file extracted_medicines.csv will be created in the same folder.

The Encoding Challenge

The main challenge of this project was finding the correct file encoding for medicines.csv. The following encodings were tested:

utf-8: Failed

windows-1252: Failed

latin-1: Failed

cp1250: (Currently testing)

The script in this repository uses cp1250 to attempt to solve this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
extracted_medicines.csv		extracted_medicines.csv
process_medicines.py		process_medicines.py

K-Kazemian/python-csv-extractor

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages