New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column Lengths Don't Reflect Actual Data #68

Closed
anabranch opened this Issue Nov 11, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@anabranch
Contributor

anabranch commented Nov 11, 2014

I'm working #38 and the data validation is killing me at every step of the way.

For example in AcronymsCd:
I'm getting DataErrors when I try to insert into PostgreSQL.

value too long for type character varying(25)
CONTEXT:  COPY temporary_table, line 480, column ACRONYM: "ILWU POLITICAL ACTION FUND"

The problem is that the char field isn't long enough to support this acronym.

class AcronymsCd(CalAccessBaseModel):
    DATE_FIELDS = ("EFFECT_DT",)
    acronym = models.CharField(max_length=25, db_column="ACRONYM")
    stands_for = models.CharField(max_length=4, db_column="STANDS_FOR")
    effect_dt = models.DateField(db_column="EFFECT_DT")
    a_desc = models.CharField(max_length=25, db_column="A_DESC")

I'd like to change these column sizes to reflect the actual data, is that something that can be supported?

@anabranch anabranch changed the title from Table Lengths Don't Reflect Actual Data to Column Lengths Don't Reflect Actual Data Nov 11, 2014

@anabranch

This comment has been minimized.

Show comment
Hide comment
@anabranch

anabranch Nov 11, 2014

Contributor
class FilernameCd(CalAccessBaseModel):
    DATE_FIELDS = ('EFFECT_DT',)
    xref_filer_id = models.CharField(
        max_length=7L,
        db_column='XREF_FILER_ID',
        db_index=True,
        help_text="The external filer id saved in the forms tables"
    )
$ head -1 filername_cd.csv
"XREF_FILER_ID","FILER_ID","FILER_TYPE","STATUS","EFFECT_DT","NAML","NAMF","NAMT","NAMS","ADR1","ADR2","CITY","ST","ZIP4","PHON","FAX","EMAIL"
$ grep -E "R98\*-\d+" filername_cd.csv | wc -l
     609

An example from that grep list

"XREF_FILER_ID",....
"R98*-185420615",....

(that's 14 in length, not 7)
You can also see they're getting truncated in the mysql database

mysql> SELECT XREF_FILER_ID FROM FILERNAME_CD where XREF_FILER_ID like 'R98*-18%';
+---------------+
| XREF_FILER_ID |
+---------------+
| R98*-18       |
| R98*-18       |
| R98*-18       |
....
+---------------+
29 rows in set (0.00 sec)

This list goes on with these issues. Pull request is almost ready...

Contributor

anabranch commented Nov 11, 2014

class FilernameCd(CalAccessBaseModel):
    DATE_FIELDS = ('EFFECT_DT',)
    xref_filer_id = models.CharField(
        max_length=7L,
        db_column='XREF_FILER_ID',
        db_index=True,
        help_text="The external filer id saved in the forms tables"
    )
$ head -1 filername_cd.csv
"XREF_FILER_ID","FILER_ID","FILER_TYPE","STATUS","EFFECT_DT","NAML","NAMF","NAMT","NAMS","ADR1","ADR2","CITY","ST","ZIP4","PHON","FAX","EMAIL"
$ grep -E "R98\*-\d+" filername_cd.csv | wc -l
     609

An example from that grep list

"XREF_FILER_ID",....
"R98*-185420615",....

(that's 14 in length, not 7)
You can also see they're getting truncated in the mysql database

mysql> SELECT XREF_FILER_ID FROM FILERNAME_CD where XREF_FILER_ID like 'R98*-18%';
+---------------+
| XREF_FILER_ID |
+---------------+
| R98*-18       |
| R98*-18       |
| R98*-18       |
....
+---------------+
29 rows in set (0.00 sec)

This list goes on with these issues. Pull request is almost ready...

@palewire

This comment has been minimized.

Show comment
Hide comment
@palewire

palewire Nov 14, 2014

Member

Sorry about the hassles. Has the merge in #69 closed the ticket?

Member

palewire commented Nov 14, 2014

Sorry about the hassles. Has the merge in #69 closed the ticket?

@anabranch

This comment has been minimized.

Show comment
Hide comment
@anabranch

anabranch Nov 14, 2014

Contributor

Indeed it has. I'll open another issue if it comes up again.

Contributor

anabranch commented Nov 14, 2014

Indeed it has. I'll open another issue if it comes up again.

@palewire palewire closed this Nov 14, 2014

@palewire

This comment has been minimized.

Show comment
Hide comment
@palewire

palewire Nov 14, 2014

Member

Okay. The official documentation from the state does list character lengths if we ever need a source, though I trust your experience with the actual data over any documentation.

Member

palewire commented Nov 14, 2014

Okay. The official documentation from the state does list character lengths if we ever need a source, though I trust your experience with the actual data over any documentation.

@anabranch

This comment has been minimized.

Show comment
Hide comment
@anabranch

anabranch Nov 14, 2014

Contributor

Interesting - It just kept coming up because I think postgres is a bit more strict when it comes to the COPY command. I'm prepping a whole write up for the postgres changes so we've got traceability for the changes that I made (and why I made them).

Contributor

anabranch commented Nov 14, 2014

Interesting - It just kept coming up because I think postgres is a bit more strict when it comes to the COPY command. I'm prepping a whole write up for the postgres changes so we've got traceability for the changes that I made (and why I made them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment