Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column Lengths Don't Reflect Actual Data #68

Closed
anabranch opened this issue Nov 11, 2014 · 5 comments
Closed

Column Lengths Don't Reflect Actual Data #68

anabranch opened this issue Nov 11, 2014 · 5 comments

Comments

@anabranch
Copy link
Contributor

@anabranch anabranch commented Nov 11, 2014

I'm working #38 and the data validation is killing me at every step of the way.

For example in AcronymsCd:
I'm getting DataErrors when I try to insert into PostgreSQL.

value too long for type character varying(25)
CONTEXT:  COPY temporary_table, line 480, column ACRONYM: "ILWU POLITICAL ACTION FUND"

The problem is that the char field isn't long enough to support this acronym.

class AcronymsCd(CalAccessBaseModel):
    DATE_FIELDS = ("EFFECT_DT",)
    acronym = models.CharField(max_length=25, db_column="ACRONYM")
    stands_for = models.CharField(max_length=4, db_column="STANDS_FOR")
    effect_dt = models.DateField(db_column="EFFECT_DT")
    a_desc = models.CharField(max_length=25, db_column="A_DESC")

I'd like to change these column sizes to reflect the actual data, is that something that can be supported?

@anabranch anabranch changed the title Table Lengths Don't Reflect Actual Data Column Lengths Don't Reflect Actual Data Nov 11, 2014
@anabranch
Copy link
Contributor Author

@anabranch anabranch commented Nov 11, 2014

class FilernameCd(CalAccessBaseModel):
    DATE_FIELDS = ('EFFECT_DT',)
    xref_filer_id = models.CharField(
        max_length=7L,
        db_column='XREF_FILER_ID',
        db_index=True,
        help_text="The external filer id saved in the forms tables"
    )
$ head -1 filername_cd.csv
"XREF_FILER_ID","FILER_ID","FILER_TYPE","STATUS","EFFECT_DT","NAML","NAMF","NAMT","NAMS","ADR1","ADR2","CITY","ST","ZIP4","PHON","FAX","EMAIL"
$ grep -E "R98\*-\d+" filername_cd.csv | wc -l
     609

An example from that grep list

"XREF_FILER_ID",....
"R98*-185420615",....

(that's 14 in length, not 7)
You can also see they're getting truncated in the mysql database

mysql> SELECT XREF_FILER_ID FROM FILERNAME_CD where XREF_FILER_ID like 'R98*-18%';
+---------------+
| XREF_FILER_ID |
+---------------+
| R98*-18       |
| R98*-18       |
| R98*-18       |
....
+---------------+
29 rows in set (0.00 sec)

This list goes on with these issues. Pull request is almost ready...

@palewire
Copy link
Member

@palewire palewire commented Nov 14, 2014

Sorry about the hassles. Has the merge in #69 closed the ticket?

@anabranch
Copy link
Contributor Author

@anabranch anabranch commented Nov 14, 2014

Indeed it has. I'll open another issue if it comes up again.

@palewire palewire closed this Nov 14, 2014
@palewire
Copy link
Member

@palewire palewire commented Nov 14, 2014

Okay. The official documentation from the state does list character lengths if we ever need a source, though I trust your experience with the actual data over any documentation.

@anabranch
Copy link
Contributor Author

@anabranch anabranch commented Nov 14, 2014

Interesting - It just kept coming up because I think postgres is a bit more strict when it comes to the COPY command. I'm prepping a whole write up for the postgres changes so we've got traceability for the changes that I made (and why I made them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.