Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash GL_2 modular forms #2123

Open
jvoight opened this issue Jun 16, 2017 · 21 comments
Open

Hash GL_2 modular forms #2123

jvoight opened this issue Jun 16, 2017 · 21 comments
Labels
feature request Feature request

Comments

@jvoight
Copy link
Member

jvoight commented Jun 16, 2017

It would be useful to have a hash for GL_2-modular forms. (We used it quite often in the genus 2 work.)

We propose following a scheme like section 4.3 of https://arxiv.org/pdf/1602.03715: something like
h(f) = sum_pp c_pp*Tr_{K/QQ}(a_pp(f)) mod P
where K is the Hecke field, pp are primes canonically ordered in a range of norms between 4096 and 8192.

@jvoight jvoight added the feature request Feature request label Jun 16, 2017
@jwbober
Copy link
Member

jwbober commented Jun 17, 2017

That seems reasonable, but I don't think we're ever actually going to have that data for most classical modular forms, and we certainly won't have it soon. I have been computing a_p(f) for p < 2000, which is already more than we need to compute L-functions in the range I've been computing, and already takes a long time. I realize that there may be good reasons for skipping the smallest primes when computing this hash, but I wonder how much it really matters. (A hash wouldn't need to be the same for every type of modular form, though.)

Also, I would like to think that eventually a 64-bit hash will be too small, but we're probably pretty far from that being a problem.

@AndrewVSutherland
Copy link
Member

AndrewVSutherland commented Jun 17, 2017

There is no reason to force the hash to depend on primes in [2^12,2^13], we did that only for convenience in the early stages of the genus 2 project. I think it is useful to skip the very small primes so that you can compute the hash without having to determine Euler factors that might be particularly difficult to compute in extreme cases. I would be happy with a hash that uses the a_p values for primes p in [2^5,2^9], say.

This should still be more than enough data to reasonably extract 128 bits (I agree that 64 bits is not enough, and if we are going to define a new hash, we may as will move to 128 bits now).

@arbooker
Copy link
Contributor

There are only 86 primes in [2^5,2^9]. Since JB has data up to 2000, how about [100,2000]?

@AndrewVSutherland
Copy link
Member

@arbooker I'm thinking more generally about cases where getting a_p's for p up to 512 might be feasible but getting to 2000 might not be. I take it you are worried that 86 a_p's might not be enough to get 128 reasonably distributed bits. I suppose in CM cases a lot of the a_p's might be zero, but I note that there is still information in knowing which a_p's are zero (and our hash function captures this). I also note that when the Hecke field and/or the base field is larger than Q we are getting more information (side note: in J.V.s' suggestion, the hash coefficients c_pp depend on the base field, so we are really defining a different hash function for each base field; of course in the classical case where we will surely have the most data, the base field is always the same).

Do you really think there is a non-negligible chance that we will encounter two distinct L-functions whose a_p's coincide for p up to 512?

@arbooker
Copy link
Contributor

@AndrewVSutherland It's unclear to me how many coefficients one should expect to need. Under GRH, any two eigenforms are distinguished by very few a_p's (O(log^2{N})). However, we're taking traces, which destroys the multiplicative structure, so it might be that Sturm's bound is the more relevant estimate. In light of that, perhaps we should take a sum over n rather than p?

Anyway, we can just compute it and see. I was mostly thinking that we might as well use the coefficients that we have. @jvoight Are there likely to be cases where it's feasible to compute up to 2^9 but not 2000?

Re: hash size, 64 bits has the advantage of being easily coded in C, without the need for gmp. Also, 128 bits would mean storing the hash as a string, right? I'm not saying that these factors are prohibitive, but we should consider whether 64 bits (actually 61 at present) is really insufficient before charging ahead. (Also, I think it's an odd combination to push for a longer hash but skimp on the hashing data.)

@AndrewVSutherland
Copy link
Member

@arbooker

  • Using a_n's would definitely be better in terms of uniqueness but this would force us to compute all the small a_p's, which could be annoying -- recall that we started the hash at larger a_p's specifically so we could compute the hash at an early stage where we hadn't necessarily determined all the bad Euler factors.

  • Modern C compilers support 128-bit arithmetic, e.g. in gcc include <inttypes.h> to define __int128. You are correct that in mongo db we would need to store a 128 bit hash as a string, but I don't think that is a big deal -- doing exact match lookups on an indexed string should still be quite fast.

  • As far as sufficiency, with 61 bits the Birthday paradox says that even in the best case (perfectly uniform hash function) we will have a better than 50/50 chance of collisions with around 2 billion L-functions. This may be more L-functions than we are every going to want to store, but the chance of collision will be non-negligible much sooner (even with the 10-20 million L-functions we currently have it is not unthinkable). I guess the trade-off is between accepting that we may get a few collisions and being prepared to deal with it or using a hash that is large enough to make it astronomically unlikely.

@arbooker
Copy link
Contributor

@AndrewVSutherland

  • We can do the same thing with a sum over n, i.e. restrict to (n,Q)=1, where Q is the product of small primes.

  • We would need a 128-bit mulmod. Doesn't that require a 128x128 -> 256 multiply? (Of course there are tricks if P is a Mersenne prime, but already by that point I would think that calling gmp is easier.) In my implementation, I'm already using __int128 to compute the 61-bit hash.

  • This hash does not distinguish individual forms in a Galois orbit. Modulo Galois, I think we will have far fewer than a billion forms.

@AndrewVSutherland
Copy link
Member

AndrewVSutherland commented Jun 21, 2017

@arbooker Sorry, I clearly need another cup of coffee -- yes we would need a 128-bit mulmod and yes it would be a pain to do that in C (but I don't see requiring GMP to be a big deal, heck, you need GMP to guild gcc). And you make a good point about the number of L-functions modulo Galois being smaller.

Regarding the a_n, if we used [2^5,2^9] restricting to (n,Q)=1 would be the same as restricting to n prime (and for the interval [100,2000] you suggested). Maybe we should consider [2^3,2^9]; that would give us 135 a_n's.

@jwbober
Copy link
Member

jwbober commented Jun 21, 2017 via email

@davidfarmer
Copy link
Member

davidfarmer commented Jun 21, 2017 via email

@AndrewVSutherland
Copy link
Member

@jwbober Two 64-bit hash functions is a good idea. We should make one as easy to compute as possible, with the expectation that in some cases there may not be enough ap's available to compute the second.

@davidfarmer The original intention of the hash (which was for our internal use) was to facilitate provisional matching of L-functions (e.g. to identify isogenous Jacobians, or to recognize isogeny factors) in the early stages where we might not yet know the conductor, root number, or some bad Euler factors (although we typically would know things like the Gamma-factors, degree, weight, etc..., that depend only on the "type" of the L-function (e.g. we know it is the L-function of a genus g curve)). But we have continued to find the hash useful in the production database.

For example, the L-function of the elliptic curve http://www.lmfdb.org/EllipticCurve/2.0.3.1/324.0.18/a/ is not currently in the LMFDB. But one can easily compute its hash, which is 1904600861308595235. If you go to http://www.lmfdb.org/Genus2Curve/Q/ and type #1904600861308595235 into the search by label box you will find the genus 2 curve http://www.lmfdb.org/Genus2Curve/Q/2916/a/ whose L-function has the same hash (in fact it's Jacobian is isogenous to the restriction of scalars of the elliptic curve above, something we can now prove but couldn't at the time we original computed these hashes).

So if a researcher has an arithmetic L-function that is not obviously in the LMFDB (e.g. it is the L-function of some strange motive they cooked up), if they can compute its hash they can then check to see whether there is an object in the LMFDB that appears to have the same L-function. To facilitate this use case, we would ideally like a way to provisionally identify matching L-functions even when the researcher doesn't yet know much about their L-function (in particular, they might not know what the conductor is, or be able to compute any zeros or special values, but if they can match it to an L-function in the LMFDB, they can at least provisionally obtain this information).

@arbooker
Copy link
Contributor

I would prefer to think of this as a hash for modular forms (or maybe motives) rather than L-functions. Accordingly, it could be an attribute of a Galois conjugacy class of modular forms or isogeny class of elliptic curves, etc., but would not appear in the L-functions collection. For L-functions, @davidfarmer and I came up with a different sort of hash based on zeros, and I'd like to stick with that.

@AurelPage
Copy link
Member

AurelPage commented Jun 21, 2017

@arbooker Does the trace really destroy everything? Doesn't it just take the product of the conjugates of the L-function, so the same bound applies? From a Galois point of view, this is just viewing a Galois representation of dimension n defined over a field of degree d (say over Q_p), and seeing it as a Galois representation of dimension nd (defined over Q_p). Then we should still expect that O(log^2 N) Trace(a_p) caracterise this representation of dimension nd.

@AndrewVSutherland
Copy link
Member

@arbooker I don't care what we call it, and I am not suggesting that it is in any way canonical or should supersede the L-function hash based on zeros (which is far more general), but I think it does make sense to have some sort of "arithmetic hash" based on a_p values that is stored in the Lfunctions collection, at least for L-functions with an Euler product (what I think David calls "arithmetic" L-functions) where it makes sense, so that we have a centralized way to search for it -- one can then browse all the instances of the L-function to find the corresponding arithmetic/analytic objects. We don't want to force the user to do a separate search for the hash in every collection of objects that have an arithmetic L-function.

@arbooker
Copy link
Contributor

@AurelPage The problem with this argument is that the O-constant need not be uniform in the dimension, and in fact it might scale quite badly. Yes, we could multiply out all the Galois conjugate L-functions and apply Rankin-Selberg+GRH, but the product has conductor N^d, not N. That's probably worse than the Sturm bound for large d.

@AurelPage
Copy link
Member

@arbooker I see, thanks for the explanation!

@arbooker
Copy link
Contributor

@AndrewVSutherland Sure, there's no harm in having an optional extra field in the L-functions collection.

We'll need to change the proposed definition if we want to continue using the hash to match genus 2 curves and elliptic curves over quadratic fields. (The current proposal views the latter as an Euler product over Q(sqrt(d)) rather than Q.) Is that preferable?

@edgarcosta
Copy link
Member

I think having an arithmetic hash and analytic hash for L-functions would be a good solution.

From what I understand, L-functions on the LMFDB are always taken as an euler product over QQ, thus from the L-function perspective, it seems more natural to have an hash that thinks of the euler product over QQ. However, if we go down that alley, it might be nontrivial to distinguish between galois conjugates.

@jvoight
Copy link
Member Author

jvoight commented Jun 22, 2017

@jvoight Are there likely to be cases where it's feasible to compute up to 2^9 but not 2000?

The Hecke operators for GL_2 are O(Npp) for fixed field degree; so given a form it is only about four times as much work to compute T_{pp} on the order of N(pp)=2^9 vs. N(pp)=2000. I'd like to think that we'll have at least this much data for any (Hilbert) modular form in which we hope to compute something about the L-function, and I suppose it's not strictly necessary that we be able to compute the hash for those objects where there is a dramatic paucity of Hecke data.

@edgarcosta
Copy link
Member

We have a column trace_hash form CMFs.
We supposedly populate this column whenever we have exact data, i.e.,

  • dim <= 20 for weight > 1.
  • and any form for weight 1*

The last isn't true at the moment

SELECT COUNT(*) from mf_newforms where trace_hash is Null and weight = 1;
 count
-------
  1580
(1 row)

@AndrewVSutherland, do you know why?

@AndrewVSutherland
Copy link
Member

@edgarcosta Because we did not compute exact eigenvalues for a_n with n in [2^13,2^14] for the A4,S4,A5 forms. These could be added without too much effort, but it will take some time -- it might be easier to just wait until @jwj61 computes all the Artin reps for these (if that is likely to happen soon).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Feature request
Projects
None yet
Development

No branches or pull requests

7 participants