Skip to content

NasserKhalili/wos-format-converter

Repository files navigation

Web of Science Format Converter

This repository contains two Python scripts that allow you to convert Web of Science bibliographic data between the following formats:

  • TabDelimited.txtWOS.xlsx and PlainText.txt
  • Filtered WOS.xlsxFiltered TabDelimited.txt and Filtered PlainText.txt

These tools are particularly useful when you:

  • Need to extract cited references (which the default WOS Excel export omits).
  • Apply document filtering using tools like PRISMA and need to export the cleaned dataset.

🔧 Scripts Overview

1. WOS_Converter_TabDelimited_to_xlsx_PlainText.py

Purpose: Converts the WOS TabDelimited.txt export file to:

  • A full-featured Excel file (WOS.xlsx)
  • A plain-text Web of Science format file (PlainText.txt) that includes full tagging (e.g., AU, CR, etc.)

Use this script right after downloading from Web of Science.


2. WOS_Converter_Filtered_xlsx_to_TabDelimitedText_PlainText.py

Purpose: After filtering your Excel file (e.g., manually or via PRISMA), use this script to:

  • Reconstruct the tab-delimited format (TabDelimited.txt)
  • Recreate the tagged plain-text format (PlainText.txt) for further processing

Use this after filtering your Excel (WOS_Filtered.xlsx) to retain only relevant records.


💡 How to Use

  1. Clone this repository:

    git clone https://github.com/yourusername/wos-format-converter.git
    cd wos-format-converter
  2. Install dependencies:

    pip install pandas
  3. Run the appropriate script based on your workflow:

    python WOS_Converter_TabDelimited_to_xlsx_PlainText.py
    # OR
    python WOS_Converter_Filtered_xlsx_to_TabDelimitedText_PlainText.py

📂 Input/Output Examples

Initial Conversion:

  • Input: TabDelimited.txt
  • Output: WOS.xlsx, PlainText.txt

After Filtering:

  • Input: WOS_Filtered.xlsx
  • Output: TabDelimited_Filtered.txt, PlainText_Filtered.txt

🧪 Compatibility with Bibliometric Tools

✅ VOSviewer

The TabDelimited.txt file generated by this tool is suitable for VOSviewer, a software tool for constructing and visualizing bibliometric networks.

Cite VOSviewer as:
Van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.
https://www.vosviewer.com


✅ Bibliometrix (Biblioshiny in R)

The PlainText.txt output is compatible with the Bibliometrix R package and its web-based interface Biblioshiny for comprehensive science mapping analysis.

Cite Bibliometrix as:
Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975.
https://www.bibliometrix.org


✅ CiteSpace

The PlainText.txt output is also compatible with CiteSpace, a Java-based application for visualizing and analyzing trends and patterns in scientific literature.

To ensure compatibility with CiteSpace:

  • Export records in Plain Text format with Full Record and Cited References
  • Name files as download_*.txt (e.g., download_1.txt)
  • Use data from supported sources like Web of Science or Scopus

Cite CiteSpace as:
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.
https://doi.org/10.1002/asi.20317


📌 Notes

  • The conversion preserves essential Web of Science tags (AU, TI, CR, etc.)
  • Cited references (CR) are correctly included in the plain-text output, unlike in the native WOS Excel export
  • Column mapping is based on the official WOS format standard

✍️ Author

Created by Nasser Khalili
If you use this tool in your research, feel free to give a ⭐ or cite the repository.


📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Releases

No releases published

Packages

No packages published

Languages