Skip to content

StayHomeLabNet/MojiSafe-Converter

Repository files navigation

✅ MojiSafe Converter

A safer TXT / CSV encoding converter that helps prevent mojibake and silent character loss

MojiSafe Converter converts TXT / CSV files between encodings such as UTF-8, Shift_JIS, and cp932,
while safely replacing unsupported characters instead of silently deleting them.

English | 日本語

FeaturesInstallSupported EncodingsQuick StartShortcutsCLIDevelopmentLicense

version license platform app files encoding

Ko-fi Buy Me a Coffee

If MojiSafe Converter is useful to you, please consider supporting the project.


MojiSafe Converter is a Windows WinForms application that reads TXT / CSV files and saves them in a selected output encoding while safely replacing characters that cannot be represented by that encoding. Instead of silently deleting unsupported characters, it replaces them with dictionary-based alternatives or traceable fallback text. This helps detect and track variant characters in personal names and place names, symbols, invisible characters, and private-use-like characters before exporting data to legacy systems.

Features

  • TXT / CSV conversion
  • Direct .xlsx worksheet to CSV conversion
  • CSV support for quoted fields
  • Two-stage replacement: string replacement, then single-character replacement
  • Fallback handling for unmappable characters: [U+XXXX] / ? / / Drop
  • Replacement log, unmappable character summary, and diff view
  • CSV export for logs and unmappable summaries
  • Add, edit, delete, import, and export dictionary rules as CSV
  • Rule search, category filtering, and category-level enable / disable
  • Rule conflict checking
  • Rule application simulator
  • Rule test cases, bulk revalidation, and failure inspection
  • Automatic dictionary backup on save
  • Backup history, restore, and dictionary diff view
  • Profile support
  • Output name presets
  • Folder batch processing
  • Progress display and cancellation
  • Settings ZIP export/import
  • Preview before importing a settings ZIP
  • Automatic input encoding detection. Detection results are shown as estimates and must be confirmed before they are applied.
  • Modernized desktop UI with a unified light theme, flatter controls, cleaner tables, and improved visual hierarchy across dialogs
  • UI language switching. Japanese and English are currently supported, and UI strings are managed as JSON files under Data\Localization.

Install

  1. Download the latest release from GitHub Releases.
  2. Extract the ZIP file.
  3. Run MojiSafe Converter.exe.

Settings and dictionaries are saved in the Data folder next to the executable. To move settings to another PC, use Export settings / Import settings in the app.

Supported Encodings

The following encodings can be selected for input and output.

  • UTF-8
  • UTF-8 BOM
  • UTF-16 LE / BE
  • UTF-32 LE / BE
  • cp932
  • shift_jis
  • EUC-JP
  • ISO-2022-JP
  • windows-1252
  • ISO-8859-1
  • GB18030
  • Big5
  • EUC-KR

Characters that cannot be represented by the output encoding are handled by dictionary rules or fallback processing.

Quick Start

  1. Select an input file. txt / csv / .xlsx files can also be dragged and dropped into the input field.
  2. Select an output file. If omitted, a name with _converted is used, or an output name preset is applied.
  3. If the input file is .xlsx, select the worksheet with Sheet. The output file is handled as .csv.
  4. Select the input encoding, output encoding, and unmappable character handling mode.
  5. If needed, click Detect. The detected encoding is displayed as an estimate and can be applied after confirmation. Excel input does not use input encoding detection.
  6. Click Preview to review the log, unmappable character summary, and diff before saving.
  7. Click Run when the result looks correct.

The original file is never overwritten. If the output file already exists, the app asks for confirmation before running.

Shortcuts

The main workflow is designed to be usable from the GUI. Common actions are available from buttons such as Detect, Preview, Run, Edit rules, Export settings, and Import settings.

If keyboard shortcuts are added or changed in future versions, list them in this section so the navigation link remains stable.

Rule Regression Tests

The rule editor includes a lightweight regression test workflow for dictionary changes.

  1. Open Edit rules.
  2. Click Rule tests to add sample inputs and expected outputs.
  3. Click Run all to revalidate all enabled test cases against the current active rule set.
  4. Use Show failed results only to focus on failures.
  5. Open a failed result in Simulator to compare expected output with the current result.
  6. If the new result is correct, update the expected output from the simulator and save.

The simulator supports:

  • Saving the current simulation result as a new test case
  • Opening an existing failed result directly
  • Highlighting mismatched characters between expected output and the current result
  • Confirming before replacing an old expected output with a new one

CLI

The GUI application remains the main entry point, but a separate console project is available for single-file conversion from Command Prompt, PowerShell, batch files, or Task Scheduler. It also supports converting one worksheet from an .xlsx file to CSV.

dotnet run --project "Cli\MojiSafeConverter.Cli\MojiSafeConverter.Cli.csproj" -- convert "input.csv" "output.csv" --input-encoding utf-8 --output-encoding shift_jis

After publishing the CLI project, the executable can be called directly.

MojiSafeConverter.Cli.exe convert "input.csv" "output.csv" --input-encoding utf-8 --output-encoding shift_jis

Excel example:

MojiSafeConverter.Cli.exe convert "input.xlsx" "output.csv" --output-encoding shift_jis --sheet "Export"

Available options:

--input-encoding <name>
--output-encoding <name>
--fallback <[U+XXXX]|?|〓|Drop>
--rules <path-or-name>
--profile <name>
--sheet <name>
--preview

--rules accepts either a JSON file path or a file name under the runtime Data folder. --sheet selects the worksheet when the input file is .xlsx. --preview reports the planned changes without writing the output file or log.

Help can be displayed with any of the following commands.

MojiSafeConverter.Cli.exe HELP
MojiSafeConverter.Cli.exe --help
MojiSafeConverter.Cli.exe help convert
MojiSafeConverter.Cli.exe convert --help

Replacement Order

Processing is performed in the following order.

  1. String replacement rules where RuleType = String
  2. Single-character replacement rules where RuleType = Character
  3. Fallback for characters that cannot be represented by the output encoding

String replacement rules are applied in this order.

  1. Priority ascending
  2. Source.Length descending

Rule Model

Rules are saved as JSON.

Source
Target
Category
Note
Enabled
RuleType
MatchType
Priority

RuleType is Character or String, and MatchType is Contains or Exact.

Data Folder And Runtime Storage

Settings and dictionaries are saved in the Data folder under the application runtime folder.

Example for a standalone distribution:

MojiSafe Converter.exe
Data\
  replacement_rules.json
  profiles.json
  output_name_presets.json
  app_settings.json
  Localization\
    Strings.ja.json
    Strings.en.json
  Backups\
  PackageImportBackups\

When distributing the folder produced by dotnet publish, the app uses the Data folder next to the MojiSafe Converter.exe that the user runs.

To move dictionaries and settings to another PC, use the app's Export settings / Import settings features.

The display language is saved in app_settings.json. If language files do not exist, the app creates default Japanese / English JSON files under Data\Localization at startup.

Dictionary Backups And Diffs

When a dictionary is saved, the previous JSON file is backed up with a timestamp under Data\Backups.

Backup history supports the following operations.

  • Restore a dictionary backup
  • Restore a backup created before settings import
  • Open the backup location
  • Select two dictionary backups and view their diff

The dictionary diff view shows:

  • Added rules
  • Deleted rules
  • Rules whose Target changed

Folder Batch Processing

Batch folder processes TXT / CSV files directly under the selected input folder. Original files are not modified; converted files are saved to the output folder.

The following tokens can be used in output names.

{name}
{ext}
{preset}
{yyyyMMdd}
{yyyyMMdd_HHmmss}

Examples:

{name}_converted{ext}
{preset}{ext}
{name}_{yyyyMMdd}{ext}
{preset}_{name}{ext}

During batch processing, the app displays the file count, current file name, and progress bar. Processing can also be canceled.

If any file fails, batch_errors_yyyyMMdd_HHmmss.csv is saved to the output folder.

Settings ZIP Export/Import

Export settings packages the following files into a ZIP.

  • Profiles
  • Dictionary JSON files
  • Output name presets
  • App settings

Import settings shows a preview before importing.

Example:

Profiles: 3
Dictionary files: 4
Output name presets: 2
App settings: present

Before importing, the current settings are backed up under Data\PackageImportBackups, then overwritten.

Development

This section is for building, publishing, and testing the project from source.

Build

Open MojiSafe Converter.sln in Visual Studio and build it. The application name shown in the solution is MojiSafe Converter.

From the command line:

dotnet build "MojiSafe Converter.csproj"

Publish

Because this folder contains multiple projects, specify the project file when publishing.

dotnet publish "MojiSafe Converter.csproj" -c Release -r win-x64 --self-contained true -p:PublishSingleFile=true

Example output folder:

bin\Release\net8.0-windows\win-x64\publish\

Distribute all files in the publish folder. After the app runs, dictionaries and settings are saved in the Data folder next to the executable.

The executable name is MojiSafe Converter.exe. The project file has also been renamed to MojiSafe Converter.csproj.

Tests

Lightweight console tests are stored under Tests\MojiSafeConverter.Tests.

dotnet run --project "Tests\MojiSafeConverter.Tests\MojiSafeConverter.Tests.csproj"

In network-restricted environments, NuGet vulnerability information warnings may appear. The tests are successful if they end with All tests passed.

If packages have already been restored and you want to reduce warnings:

dotnet run --no-restore --project "Tests\MojiSafeConverter.Tests\MojiSafeConverter.Tests.csproj"

Troubleshooting

Text Looks Garbled

  • Check that the input encoding is correct.
  • Detect is an estimate, not a guarantee. Review the detected result before applying it.
  • When opening CSV files in Excel, also check the original CSV encoding and Excel's import settings.

Text On The Screen Is Hard To Read

Display depends on the Japanese font environment in Windows. The default font should normally work, but if text is distorted, check the Windows display language, Japanese supplemental fonts, and DPI settings.

You Are Unsure About ZIP Import

A preview is shown before import. Check the number of profiles, dictionary files, and output name presets before continuing.

Settings before import are backed up under Data\PackageImportBackups. If needed, they can be restored from Backup history.

The Output File Already Exists

The app asks for confirmation before overwriting the output file. The original input file is never overwritten, but the output file can be overwritten after confirmation.

MSB1011 Appears During Publish

This happens because the folder contains multiple projects. Specify the .csproj file explicitly.

dotnet publish "MojiSafe Converter.csproj" -c Release -r win-x64 --self-contained true -p:PublishSingleFile=true

License

This project is released under the MIT License. See the LICENSE file for details.

About

Safely convert TXT and CSV files by replacing unsupported characters instead of dropping them.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages