Skip to content

abdulahadakram/postcode-csv-extractor

Repository files navigation

UK Postcodes Processor

A fast and efficient Python script to process UK postcodes data from Doogal.co.uk and extract only the essential columns for active postcodes.

Features

  • Memory Efficient: Processes large CSV files in chunks to handle datasets with millions of rows
  • Fast Processing: Optimized for speed when dealing with large postcode datasets
  • Filtered Output: Only keeps active postcodes (where "In Use?" = "Yes")
  • Essential Columns: Extracts only the most commonly needed columns:
    • Postcode
    • Latitude
    • Longitude
    • District
    • Country

Prerequisites

  • Python 3.7 or higher
  • pip (Python package installer)

Installation

  1. Clone this repository:
git clone <repository-url>
cd postcodes
  1. Create a virtual environment:
python -m venv venv
  1. Activate the virtual environment:

    • Windows:
      venv\Scripts\Activate.ps1
    • macOS/Linux:
      source venv/bin/activate
  2. Install dependencies:

pip install -r requirements.txt

Usage

  1. Download the UK postcodes CSV file from Doogal.co.uk
  2. Place the postcodes.csv file in the project directory
  3. Run the processing script:
python process_postcodes.py

The script will:

  • Process the CSV file in chunks of 100,000 rows
  • Filter for active postcodes only
  • Extract the 5 essential columns
  • Save the result to active_postcodes.csv
  • Display progress updates during processing

Output

The script generates active_postcodes.csv with the following structure:

Postcode,Latitude,Longitude,District,Country
AB1 0AA,57.101474,-2.242851,Aberdeen City,Scotland
AB1 0AB,57.102554,-2.246308,Aberdeen City,Scotland
...

Performance

  • Processing Speed: ~100,000 rows per chunk
  • Memory Usage: Optimized for large files (2M+ rows)
  • Output: Approximately 66% of postcodes are active (varies by dataset)

Data Source

This project processes data from Doogal.co.uk, which provides comprehensive UK postcode information including:

  • Full list of UK postcodes (active and inactive)
  • Geographic coordinates
  • Administrative boundaries
  • Population data

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Acknowledgments

  • Doogal.co.uk for providing the comprehensive UK postcodes dataset
  • The pandas library for efficient CSV processing
  • The Python community for excellent data processing tools

Support

If you encounter any issues or have questions, please:

  1. Check the FAQ for common solutions
  2. Open an issue on GitHub
  3. Contact the maintainers

Changelog

v1.0.0

  • Initial release
  • Basic postcode processing functionality
  • Memory-efficient chunk processing
  • Active postcode filtering

About

Efficiently extracts active UK postcodes and key location fields from large CSV datasets.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages