Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: wrong foxpro DBF file read #3243

Open
JonhSilver opened this issue Apr 7, 2024 · 7 comments
Open

BUG: wrong foxpro DBF file read #3243

JonhSilver opened this issue Apr 7, 2024 · 7 comments

Comments

@JonhSilver
Copy link

gpd.read_file(self.dbf_path, encoding='cp1257')

return coluns values as
0 ļ?
1 »7
2 _6
3 Ó6

instead of
0 2552552552552552552552552552552552552552552552...
1 0680000482540310080000000000000000000000000000...
2 0842512262552550152482272551950310000000000001...
3 0840042362542550090000040000002240030000000000...

@martinfleis
Copy link
Member

Can you share the files and details of your environment? At least the output of geopandas.show_versions().

@JonhSilver
Copy link
Author

JonhSilver commented Apr 7, 2024

SYSTEM INFO

python : 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)]
executable : C:\Program Files\Python38\python.exe
machine : Windows-10-10.0.20348-SP0

GEOS, GDAL, PROJ INFO

GEOS : 3.11.3
GEOS lib : None
GDAL : 3.6.4
GDAL data dir: C:\Program Files\Python38\lib\site-packages\fiona\gdal_data
PROJ : 9.2.0
PROJ data dir: C:\Program Files\Python38\lib\site-packages\pyproj\proj_dir\share\proj

PYTHON DEPENDENCIES

geopandas : 0.13.2
numpy : 1.24.4
pandas : 2.0.3
pyproj : 3.5.0
shapely : 2.0.3
fiona : 1.9.6
geoalchemy2: None
geopy : None
matplotlib : None
mapclassify: None
pygeos : None
pyogrio : 0.7.2
psycopg2 : None
pyarrow : None
rtree : None
None

@JonhSilver
Copy link
Author

JonhSilver commented Apr 7, 2024

i think it read only dbf file do not pay attention to FPT on dbf memory columns

R05_USER.DBF.zip
just rename to R05_USER.DBF

R05_USER.FPT.zip

2024-04-07_220628

@brendan-ward
Copy link
Member

I'm not familiar with .fpt files and I don't think they are used within the GeoPandas stack. Fiona (used by GeoPandas by default in your version) and pyogrio (also installed in your env) both use GDAL to read DBF files as part of the ESRI Shapefile driver.

GDAL automatically detects that the DBF file is in CP1257 based on the code page set in the DBF file, so it attempts to automatically convert to UTF-8. Likewise, if you provide 'encoding="cp1257") it attempts to use that encoding to convert to UTF-8.

However, it appears that perhaps that is not the correct encoding for this file? GDAL emits this warning before returning scrambled text:
Warning 1: One or several characters couldn't be converted correctly from CP1257 to UTF-8. This warning will not be emitted anymore

It isn't clear from your example above what values are correct here; in the DBF file the values in R05_LAUKAS through R05_LAUKA7 all appear to be single character values.

@theroggy
Copy link
Member

theroggy commented Apr 8, 2024

I tried opening the file in ArcMap and it doesn't open it at all... so It seems the .dbf diverges at least slightly from the typical shapefile .dbf.

@JonhSilver
Copy link
Author

you can open dbf file using FOXPRO vfp9
https://github.com/VFPX/VFPInstallers or
https://www.refox.net/

@theroggy
Copy link
Member

theroggy commented Apr 8, 2024

I'm not sure, but I think this .dbf will be too "modern" or "advanced" to be read by a parser intended to read ESRI shapefiles.

A theory loosely based on a quick scanning of this page: I guess that the fields that don't show the correct data are "memo" fields with longer text fields. If I understand correctly, the data for the "memo" fields in the .dbf file is only a reference to where the real data can be found in the .fpt file. Because GDAL (and hence fiona and pyogrio) only support the .dbf files as described by the ESRI shapefile format (=(a subset of) dBase 4 and 5 format I think), it doesn't use the "modern"/"advanced" .fpt file, so it cannot show the data in the memo field. Hence it just shows the data in the .dbf, which are just a few random characters as they are meant as pointers/offsets in the .fpt file rather than real data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants