Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imdbpy2sql.py never completes (OS X 10.7.5 Lion) #19

Closed
ambrosechapel opened this issue Oct 7, 2014 · 13 comments
Closed

imdbpy2sql.py never completes (OS X 10.7.5 Lion) #19

ambrosechapel opened this issue Oct 7, 2014 · 13 comments

Comments

@ambrosechapel
Copy link

I've tried to run imdbpy2sql.py three times now and each time it gets to the stage "adding foreign keys" and stalls. I left it running for more than 24 hours then gave up. Running it again I can see it get to around 47/48 minutes CPU time in Activity Monitor (which is like a GUI version of the top command) then nothing happens. There's no disk activity and the memory usage is zero.

I set the computer never to sleep and ran the command through nohup in case that was a factor but the same thing happened.

As it's done most of the work except for the foreign keys, is there a way I can re-run it, skipping everything else and just doing the foreign keys?

@alberanid
Copy link
Collaborator

On Tue, Oct 7, 2014 at 10:08 PM, ambrosechapel notifications@github.com wrote:

I've tried to run imdbpy2sql.py three times now and each time it gets to the stage "adding foreign keys" and stalls.

Yep, that's a know problem.
In fct, once you've stopped it, the database should contain all
the data and the indexes.
Just some foreign keys (with their indexes) are missing.

If this creates you some performance problems, you can easily
recreate them.
The schema of the database can be found in the imdb/parser/sql/dbschema.py
file.

As always, any help fixing the real problem would be
greatly appreciated.

Davide Alberani da@mimante.net [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

@ambrosechapel
Copy link
Author

Hi, thanks for responding. I have now figured out the problem and solved it successfully for me at least (XAMPP on MacBook/iMac).

The problem was the tables being created as INNODb by default.

I added this to the command line and everything completed successfully:

    -e "AFTER_CREATE:FOR_EVERY_TABLE:ALTER TABLE %(table)s ENGINE=MyISAM;"

@zizhaozhang
Copy link

Hi May I ask how do you solve this problem? I met it too. Thanks so much!

On Fri, Oct 17, 2014 at 9:19 PM, ambrosechapel notifications@github.com
wrote:

Hi, thanks for responding. I have now figured out the problem and solved
it successfully for me at least (XAMPP on MacBook/iMac).

The problem was the tables being created as INNODb by default.

I added this to the command line and everything completed successfully:

-e "AFTER_CREATE:FOR_EVERY_TABLE:ALTER TABLE %(table)s ENGINE=MyISAM;"


Reply to this email directly or view it on GitHub
#19 (comment).

Best,
Zizhao

@ambrosechapel
Copy link
Author

Like I said above, I added something to the command, so as the documentation says:

To create the tables and to populate the database, you must run
the imdbpy2sql.py script:
# imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u 'URI'

but if you add my extra code above you get

imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u 'URI' -e "AFTER_CREATE:FOR_EVERY_TABLE:ALTER TABLE %(table)s ENGINE=MyISAM;"

and that should work.

@ambrosechapel
Copy link
Author

My recommendation to solve the problem is to explicitly state that there's no guarantee the script will ever complete using INNODb and say that using MyISAM is highly recommended.

@zizhaozhang
Copy link

I got this. Have you seen the data in this sql database. I found actually
many information (features) does not exist in this database.I know this
imdbpy2sql is actually convert the raw data downloaded from a mirror
website to a SQL. But most of the details information which the raw data
has but the sql does not have.
Thanks

On Fri, Oct 17, 2014 at 9:26 PM, ambrosechapel notifications@github.com
wrote:

Like I said above, I added something to the command, so as the
documentation says:

To create the tables and to populate the database, you must run
the imdbpy2sql.py script:

imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u 'URI'

but if you add my extra code above you get

imdbpy2sql.py -d /dir/with/plainTextDataFiles/ -u 'URI' -e "AFTER_CREATE:FOR_EVERY_TABLE:ALTER TABLE %(table)s ENGINE=MyISAM;"

and that should work.


Reply to this email directly or view it on GitHub
#19 (comment).

Best,
Zizhao

@ambrosechapel
Copy link
Author

Which files did you download? The documentation recommends you download every .gz file in this folder:

ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/

@zizhaozhang
Copy link

I know. I did the same. But I think the imdbpy2sql does not save detail
infromation. For example, Can you find the budget and gross of a specific
movie from the SQL which those information should be in the
"business.list.gz"?
Or I missed something?

Thanks

On Fri, Oct 17, 2014 at 9:36 PM, ambrosechapel notifications@github.com
wrote:

Which files did you download? The documentation recommends you download
every .gz file in this folder:

ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/


Reply to this email directly or view it on GitHub
#19 (comment).

Best,
Zizhao

@ambrosechapel
Copy link
Author

I hadn't looked at that part of the information I'm afraid, and I'm not at the right computer now to check the database. Maybe alberanid can answer that.

@zizhaozhang
Copy link

OK~You can have a try. I am also doing this kind of work recently.
Hope more discussion soon.

On Fri, Oct 17, 2014 at 9:57 PM, ambrosechapel notifications@github.com
wrote:

I hadn't looked at that part of the information I'm afraid, and I'm not at
the right computer now to check the database. Maybe alberanid can
answer that.


Reply to this email directly or view it on GitHub
#19 (comment).

Best,
Zizhao

@ambrosechapel
Copy link
Author

I have done an import successfully on this computer and I can see the budget information.

According to info_type table, budget information is type 105.

So if I look in movie_info for info_type_id =105 I see sums of money. They are a bit weird because they are prefixed with a currency symbol like "$", "£" so sorting is pretty much impossible. But they are there. Hope this helps.

@alberanid
Copy link
Collaborator

On Sun, Oct 19, 2014 at 7:13 AM, ambrosechapel notifications@github.com wrote:

So if I look in movie_info for info_type_id =105 I see sums of money.

Yep, each and every information from the plain text files is
available via IMDbPY.

From the business list we have: 'budget', 'weekend gross', 'gross',
'opening weekend', 'rentals', 'admissions', 'filming dates', 'production dates',
'studios', 'copyright holder'.

Davide Alberani da@mimante.net [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

@zizhaozhang
Copy link

Thanks all. I appreciate your help.

On Sun, Oct 19, 2014 at 5:02 PM, Davide Alberani notifications@github.com
wrote:

On Sun, Oct 19, 2014 at 7:13 AM, ambrosechapel notifications@github.com
wrote:

So if I look in movie_info for info_type_id =105 I see sums of money.

Yep, each and every information from the plain text files is
available via IMDbPY.

From the business list we have: 'budget', 'weekend gross', 'gross',
'opening weekend', 'rentals', 'admissions', 'filming dates', 'production
dates',
'studios', 'copyright holder'.

Davide Alberani da@mimante.net [PGP KeyID: 0x465BFD47]
http://www.mimante.net/


Reply to this email directly or view it on GitHub
#19 (comment).

Best,
Zizhao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants